数据质量:识别连任的复制版本tices with FME

Data QA Identifying Sliver Overlaps and Gaps in Polygon Coverage

Data QA: Identifying Bad Topology in Linear Networks

Data QA: Identifying Consecutive Duplicate Vertices with FME

Data QA: Identifying Duplicate Attribute Values

Data QA: Identifying Duplicate Features with FME

数据QA：识别特征比最小距离更近

Data QA: Identifying Invalid Geometry Types

Data QA: Identifying Invalid Spatial Relationships

Data QA: Identifying Self-Intersections with FME

Data QA: Identifying Short Line Features

Data QA: Identifying Small Polygon Features

Data QA: Identifying Spikes and Outliers with FME

Data QA: Invalid Spatial Schemas

Articlebymark2atsafe·Sep 12, 2017 at 07:00 PM·edited·Oct 12, 2017 at 07:27 PM

与FME桌面2017.0创建文章

重复顶点

A duplicate vertex (duplicate point) occurs when a geometry has one or more vertex that occurs multiple times within the feature. Duplicate vertices are those with identical X,Y, and Z coordinate values, to as many decimal places as exist in the data.

Duplicate vertices are not only a sign of lower quality data, they can also be a data format problem. Some formats permit duplicate vertices (for example, MicroStation DGN allows zero-length lines) while other formats prohibit duplicate vertices (for example Oracle Spatial).

The duplicate vertex might occur sequentially in the geometry (for example, A,B,C,C,D,E) or it might occur out of sequence (A,B,C,D,C,E). It might just be duplicated once (A,B,C,C,D), or it might be duplicated multiple times (A,B,C,C,C,D,C,E,C).

Of course, sometimes a duplicate vertex is valid; for example a polygon start and end point should be identical if it is to close properly (A,B,C,D,E,A) and sometimes a linear feature should loop around and rejoin mid-point (A,B,C,D,E,C); so it is not always easy to identify invalid features on this basis alone.

There are various FME transformers that can be used to identify duplicate vertices, but some transformers - or combinations of transformers - will be much more efficient than others.

GeometryValidator: This transformer identifies and fixes duplicate vertices that occur consecutively within a single geometry.

ClosedCurveFilter：这形成一个闭合回路，并能，因此，可用于检测（或从怀疑消除）变压器识别特征具有重复结束点的功能。

CoordinateExtractor: This transformer extracts a list of coordinates from a feature, which can then be analyzed to look for duplicates.

一般来说，由于连续重复的顶点是一个比较明显的问题GeometryValidator更常用。

However, the CoordinateExtractor is better for detecting duplicate vertices that occur out of sequence, so that further investigation can take place.

This example uses a combination of ClosedCurveFilter and CoordinateExtractor to identify duplicate points that are unsequenced. A second example uses the GeometryValidator transformer to identify sequential duplicate points.

Downloads

源数据集

Duplicate Non-Consecutive Vertices: Workspace as a Template

Source Data

The source data is a MicroStation Design file containing line features that represent building outlines:

The scenario is that we wish to validate and clean the data before it is put into production use.

Locating Non-Sequential Duplicate Vertices: Step-by-Step Instructions

Locating non-sequential duplicate vertices is not as straightforward as sequential duplicates; however, it can be done. Follow these steps to discover one method to locate non-sequential duplicate vertices.

1. Start FME Workbench and begin with an empty canvas. Select Reader > Add Reader from the menubar. Set the data format to Bentley MicroStation Design (V8). Select the attached MicroStation dataset as the source. If you click the parameters button you'll find there is an advanced parameter to remove duplicate points:

Ensure this parameter is turned off as we want to identify where and how many duplicate vertices there are. So simply click OK to add the reader. If/when prompted, select the BuildingFootprints level as the data to be read.

2.单击画布上的阅读器功能类型。在弹出的菜单中选择检查选项，在数据检查，以查看数据。检查数据。数据正确一目了然，这是难以判断出可能有重复的顶点。

3.早在FME工作台读者特征类型后添加ClosedCurveFilter变压器。添加检查变压器的输出端口和运行转换，这取决于颜色随机产生，你可能需要上色ClosedCurveFilter_Open功能分化的结果。它将确定一个打开的功能是这样的：

This is a feature with a duplicate vertex, but it doesn't close like a polygon would. It may, or may not, be considered a problem feature, but since this is meant to be a building we can probably assume it's incorrect.

4.要找到不连续重复点，我们将抽取坐标列表和重复检查。当然，这是在不同的功能，不要混淆点很重要，不包括多边形的开始/结束点。

Here the data does not have a unique ID for each feature, so we should create one by adding a Counter transformer. That way identical points on different features will not be confused:

The default parameters - which will create an attribute called _count - are fine for our purposes.

5. Now add a CoordinateExtractor transformer after the Counter. The parameters should be set to extract All Coordinates to a list called _indices:

If you wish, connect an Inspector transformer and run the workspace. Query a feature and you'll find that it now has a list containing its vertices.

6.好的,我们想分析坐标列表,布鲁里溃疡t we can't do it as a list object. There is no specific list transformer that will find duplicate values among multiple values (the ListDuplicateRemover will find duplicate X values, or duplicate Y values, but not a combination of duplicate X and Y). So, we'll explode the list into one feature per list element using the ListExploder transformer:

If you wish, attach an Inspector transformer and run the workspace. You'll see there is now one feature per vertex. Each vertex has its position in the list recorded as _element_index:

由上述可知，构建55具有5个顶点，编号为0至4。第一个和最后的顶点匹配，这意味着它是一个闭合线（这是很好的）。

7.现在我们可以开始删除不属于（或不作为计）重复顶点。

Place a Tester transformer after the ListExploder. Set up the parameters to test for _element_index = 0 (i.e. this is the first coordinate of the line).

These are the features we want to drop - because otherwise the first and last point of a closed line would match and be flagged as an error - so the Failed port are the features we want to keep.

8. Now place a DuplicateFilter transformer, connected to the Tester:Failed port:

Set up the transformer to filter out duplicate values of _count, x, y, and (optionally) z. i.e. on the same feature (count matches) flag up vertices with an identical x,y,z.

Connect an Inspector transformer to the DuplicateFilter:Duplicate port and run the workspace. The result will look like this:

There is one unclosed feature and six features flagged with duplicate vertices. In fact, there will be a feature for every vertex of a building that is a duplicate, so if a building has two duplicate vertices there will be two features to represent it. The x/y/z attributes of the feature identify where the duplicate vertex lies.

Notes

If you're overthinking the problem (as I was) you might be wondering if there is any effect introduced by dropping the first point. For example, where we have A,A,B,C,D,E or A,B,C,A,D,E - would there be a problem because the first A feature is dropped and so won't match with any subsequent A's?

Well, no, for various reasons:

If it's A,A,B,C,D,E then the two A's are consecutive and you could find those with the GeometryValidator. But even if you didn't...
If it's a closed line then "E" is the same as "A" anyway, so subsequent A's will match with E.
如果它不是一个封闭的线，则ClosedCurveFilter将已经将此问题标记为可能出现的问题的功能。

Counting the number of problem vertices is as simple as introducing a StatisticsCalculator (as in the prior example) to count the features.

Fixing the problem vertices is another matter. Technically we could use the CoordinateRemover to drop one of the bad vertices. But there is no guarantee that we would remove the correct one. For example, add a CoordinateRemover after the DuplicateFilter, set to remove vertex "_element_index" (which we know to be a duplicate):

结果适用于某些功能，而不是其他人：

Therefore it's suggested that this technique should be used to identify non-consecutive duplicate coordinates, but not to fix them. The problem features should be passed on to a proper editing tool for fixing.

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia (data.vancouver.ca). It contains information licensed under the Open Government License - Vancouver.

duplicateverts7.png (2.9 kB)

duplicateverts8.png (22.0 kB)

duplicateverts9.png (7.2 kB)

duplicateverts10.png （31.0 KB）

duplicateverts11.png (17.5 kB)

duplicateverts12.png (8.7 kB)

duplicateverts13.png (7.3 kB)

duplicateverts14.png (39.5 kB)

duplicateverts15.png (20.8 kB)

duplicateverts16.png （18.5 KB）

duplicateverts17.png (12.6 kB)

buildings.dgn (55.8 kB)

duplicatenonsequentialpoints.fmwt (49.3 kB)

Add comment

10 |4000characters neededcharacters leftcharacters exceeded

Attachments:多达10个附件（包括图像）可以具有最大的每4.0 MB总4.0 MB被使用。

4People are following this .

重复顶点

Downloads

Source Data

Locating Non-Sequential Duplicate Vertices: Step-by-Step Instructions

Notes

Data Attribution

Article

Follow this article

导航

Related Articles

Related Articles

Data QA: Identifying Self-Intersections with FME

Data QA: Invalid Spatial Schemas

Data QA: Identifying Duplicate Attribute Values

Data QA Identifying Sliver Overlaps and Gaps in Polygon Coverage

Data QA: Identifying Spikes and Outliers with FME

Data QA: Identifying Bad Topology in Linear Networks

Data QA: Identifying Invalid Spatial Relationships

Data QA: Identifying Short Line Features

Data QA: Identifying Invalid Geometry Types