Hi FME’ers,
Interacting with FME users I see various points of view on the merits of theJoiner变压器与FeatureMerger。Although both transformers carry out similar actions, it’s not clear to users when you should use each of these – particularly in relation to workspace performance.

So this post will indulge in some investigative journalism! I’ll compare and contrast these transformers, to see where you would want to use each of them, and throw other transformers – such as the updatedSQLExecutor和新的FeatureReader– into the mix as potential alternatives.

Descriptions
First a description. In general these transformers are used to merge features together. For example, you have a Shape dataset of address points, a database of related records, and want to join the two sets of information into one. That’s when you would use one of these transformers.

To do the merge requires some common information; usually a common ID number.

FeatureMerger
FeatureMergeris for when both sets of data are being read in a workspace. It takes input into two ports – Requestor and Supplier – and merges information from each supplier onto the requestor with the same ID number.

Joiner
Joinerhas only a single input port. The port is called ‘INPUT’ but you might think of it as the ‘requester’. It requests information not from other FME features, but directly from records in a database; again using a common ID number to carry out the match.

FeatureReader + SpatialFilter
FeatureReaderJoiner,但允许通过空间关系而不是常见属性值匹配特征。这空间传说would do a similar job where both sets of features are already read into the workspace, making it the spatial equivalent of theFeatureMerger

SQLExecutor
SQLExecutorcarries out a SQL command on a database – that command could be a select statement with a where clause, effectively allowing joins to be carried out.

Performance Tests
测试数据最初由12,292个地址记录组成,但将它们克隆了100次,以提供更大的数据集(1,229,200个功能),并将其分为空间(点功能)和非空间(文本记录/属性)。一个常见的ID(地址ID)确定了两种数据类型之间的关系。

对于格式,将空间数据写成形状,但对CSV和SQLSERVER的空间非空间。


Test 1
测试1是一个简单的测试,可将CSV属性重新组合回形状空间特征。这FeatureReaderandSQLExecutoraren’t valid here because they only do a match with a true database format.

Time

Max Memory

Joiner

5′ 41″

299,692kb

FeatureMerger

12'21''

873,732KB

所以在这里Joiner胜出:它更快,使用的内存少得多。基本上,它不需要将所有功能读取为记忆FeatureMergerdoes. Without further blocking (group-based) transformers in the workspace, the data can start to be written while joins are still going on!


Test 2
测试2与测试1相同,但使用SQL Server数据库而不是CSV文件。作为适当的数据库,FeatureReaderandSQLExecutor现在是有效的选项。

Time

Max Memory

Joiner

21′ 40″

214,160KB

FeatureMerger

13′ 30″

881,160kb

FeatureReader

1小时14'21'

45,892kb

SQLExecutor

1hr 55′ 56″

45,860kb

This time theFeatureMergerwins out in terms of speed. I suspect that’s because theJoiner, FeatureReader,andSQLExecutor每次读取都必须点击数据库,而不是一次FeatureMerger。所以他们在低内存使用,但是因为缓慢of waiting for multiple database responses. To test that theory there is a setting in theJoiner在过程开始时,将整个表内容读取到缓存中。

When that is set the result comes back as follows:

Time

Max Memory

Joiner

11′ 09″

872,764kb

FeatureMerger

13′ 30″

881,160kb

So there too the Joiner is the better option. Notice that the memory usage has shot up toFeatureMergerlevels – because theJoineris now also reading all data at the start – but that it still runs faster.


Test 3
In this test the spatial features will be renumbered. Originally they were 1-1,229,200 but now I’ll group them into 100 batches where each batch’s features are numbered 1-12,292.

这有点人为,但仅意味着每个供应商都需要多次使用。例如,您可以将其视为与zip/邮政编码相匹配的地址(其中许多地址具有相同的代码)。

No pre-fetch was set for the Joiner; just a cache size of 10,000.

这results were as follows:

Time

Max Memory

Joiner

8′ 37″

80,776kb

FeatureMerger

11′ 32″

878,680kb

表演益处的主要原因Joineris that it doesn’t need to read every single record. Here only the first 12,292 records could possibly be a match: theJoinercan read individual records but theFeatureMerger不管它永远不会使用90%的事实,都必须阅读所有表。

But without a pre-fetch, wouldn’t theJoiner必须继续访问数据库吗?好吧,不。发生的是Joinerfilled a cache with up to 10,000 records by itself as it retrieved data. So when the same record needed to be matched a second time, FME already had possession of it. The log proves that 80% of the time theJoinerdidn’t need to go to the database for the required info:

@Relate: Database query statistics for table `Joiner:dbo.ADDRESSINFO’: 1229200 queries made of which 0 were sequential duplicates and 990099 hit the record cache of 10000 records (80% overall cache hit)

So, even though theFeatureMerger已经具有所有功能,Joinermanaged to keep up by caching re-usable records.

结论
用例相当清楚:

What might not be clear is how to tell when memory might become an issue. Well, the space used by the data table would be a big clue, especially when combined with the size of the spatial data files. If they are getting on to 4GB then I would think it’s pushing the limit for a single read and that the Joiner should be preferred.

Also, sometimes the use of a particular transformer is dictated by it being a special case.

特别案例
是的,有时候您绝对需要使用一个特定的变压器而不是其他变压器。

Spatial Relationship
If you want to merge features on the basis of a spatial relationship, then only theFeatureReader(或者空间传说)will do the job. The others work on ID only.


Unused References
当您想查找连接过程中未使用/未使用哪些供应商功能时,则FeatureMerger是要使用的变压器。其他人会告诉您哪些请求者找到了匹配项,但没有哪些供应商参加。


Transformation Results
It’s a bit hard to describe, but theFeatureMerger当您想转换一些数据时,很有用,但是将结果合并回原始几何形状。

例如,在这里,一个用户希望检查自身交流,但实际上并没有在这些点上打破数据。诀窍是进行自我交流,然后合并结果_ segments归因于原始几何形状FeatureMerger


Multiple Keys
When there are multiple ID matches that need to be made (i.e. there are multiple requester fields that must match multiple supplier fields) then theJoinertransformer is the one to use. TheFeatureMergeronly works on a single ID field (unless you can concatenate multiple fields). TheFeatureReaderwould work (using a WHERE clause) but wouldn’t really be worth doing unless there was a spatial join aspect as well.


Multiple Tables
Another scenario involves multiple tables; basically a requestor and supplier but where the join is made through a lookup table between the two.

For example, I have a dataset of water pipes and want to identify who manufactured the pipe. Each pipe has a MaterialID field to identify the material type in a PipeMaterials table. In turn the PipeMaterials table has a ManufacturerID field to identify the manufacturer in a MaterialsManufacturer table.

To get the name of the pipe manufacturer would need two Joiner transformers at least, or readers to read three full tables for the FeatureMerger.

但是SQLEXECUTOR TRUSSSITER在这里是理想的选择,因为可以将JOIN内置到一个查询中,例如:

选择 *来自Materalmanufacturer.manufacturerid =的物料制造商=(从pipematerials中选择pipematerials.materialid = @value(pipematid))


关系完整性
When the integrity of the relationship needs to be tested, then theJoineris again the transformer to use. It will let you test whether the requester/supplier match is 1:1, 1:M, or various combinations – and throw an error if this relationship is breached. TheFeatureMergerhas fewer options and will not error if a problem exists.

Caution!
Joinerwill suffer badly if data is not indexed on the search key. For example, one test query took 30 minutes without an index, versus 2 minutes when the key field was indexed. You’ll notice in the log file when this happens because the time listed for “CPU” will be way less than the total time.

FeatureMergerdoesn’t care about an index, because it has to read everything. So it might appear to be faster, but if so theJoiner/数据库可能无法正确调整。

Also, theFeatureReaderisn’t as bad as it appears here. TheFeatureReaderJoiner作为空间传说featuremerger,所以当数据需要加入了空间遗传代数ionship then that transformer should be the one to use, and there are an absolute mountain of use cases that benefit from it.

同样,SQLExecutor除了加入数据外,还有许多其他用途JoinerandFeatureMergerjust couldn’t achieve.

我希望这是使用这两个变压器的有用指南。

关于FME Data Transformation Databases FME Desktop 内存管理 Performance

Mark Ireland

Mark,又名Imark是FME福音传教士(Est。2004),对FME培训充满热情。他喜欢能够以新颖有趣的方式帮助人们理解和使用技术。他的其他激情之一是足球(又名足球)。他非常喜欢技术和足球,以至于他一起写了一篇有关这两者的文章!谁会想到?(答案:伊姆克)

Comments

对“ FME2011用例:Joiner vs featuremerger”的一种回应

  1. Carlos says:

    Best.Post.Ever.

    认真地说,对于像我这样的新手,B/c分析的90%(使用ARCMAP或FME)应该需要阅读这一点,涉及将表/功能合并在一起。

    Thanks for the clarification & tips!

相关文章