Parallel Processing

Articleby dmitribagh· Oct 08,2015 at 07:15 PM· mark2at亚搏在线safeedited · Feb 26 at 07:26 PM

Article created with FME Desktop 2015.0

Introduction

Each FME translation is usually a single process on your computer.However,FME can be set up to take advantage of multiple-core processors and improve parallelization of computations (doing multiple tasks at once).FME also makes use of hyper-threading,a technology used to make each physical core appear as two logical processors to the host operating system.

By using parallel processing,performance may be improved significantly over a single process.

Notes:

As of FME2019,the Parallel Processing options have been removed from most transformers and exist only in the custom transformer infrastructure.A separate article exists to explainsetting up parallel processing in a custom transformer.
For a more basic introduction to parallel processing,including a step-by-step tutorial exercise,see the articleHow to Use Parallel Processing in FME.

Setting Up Parallel Processing

As a brief introduction,note that each parallel process in FME uses its own set of data,and data cannot be passed between processes.Therefore you must divide data into groups using a Group-By parameter,and set each group to be handled by a different process.

Here a user is calculating statistics about the number of visitors to parks in the city of Vancouver,using FME's StatisticsCalculator transformer.Each park has an attribute that defines which neighborhood it resides in.That neighborhood attribute is used to group the data and by setting a Parallel Processing Level,each group is handled by a separate process,potentially improving performance.

Parallel Processing Levels

The processing level determines how many processes run in parallel.Minimal creates the fewest processes.Extreme creates the most.The exact amount depends on the number of cores and processors on the computer being used:

However,there is a limit to the number of processes that FME will create.This limit is tied to the FME license level.FME Base Edition allows a maximum of 4 processes;Professional Edition: 8;All other editions: 16.

Parallel Processing Tips

Here are some general tips for parallel processing:

Data must be divisible into groups.If no group attribute is selected set in the transformer Group By,there will be no parallel processing.
Groups will be processed independently.In the above example,each neighborhood will get its own set of statistics.If features in one group depend on features in another,then processing them separately will produce incorrect results.
If you do not have an attribute that defines groups,then groups can be created using other transformers such as the ModuloCounter.This blog postexplores different techniques for creating artificial groups.

Parallel Processing and Performance

In theory,parallel processing should produce results faster than a single process.However,there are instances where that might not be the case:

Parallel processing only makes sense when the data volumes are big enough - for smaller datasets,the overhead of running multiple FMEs can easily make the translation slower than a single process.i.e.if there are only a handful of parks in the above example,then the benefits of parallel processing can be negated by the cost of starting multiple processes.
A greater number of parallel processes does not always correlate with better performance.For example,in "Aggressive" or "Extreme" mode,there might be so many processes that they are fighting each other (or the operating system) for system resources.

Parallel Processing and Custom Transformers

Rather than have a single transformer carry out parallel processing,it's possible to enable parallel processing on a whole group of transformers.

This is done by creating a Custom Transformer from that group.A custom transsformer has its own parameters for parallel processing,and it does not have to be limited to a single transformer within it.

Using a Custom Transformer like this also means that the "group by" and "parallel process by" settings can be different (for example I might group my parks together by neighborhood,but parallel process them on the basis of city).

Parallel Processing Examples

This is a set of examples where Parallel Processing was of use.For a tutorial to carry out yourself,seethis page.

All examples here were conducted on a Quad Core (8 virtual processors) machine with 4Gb of RAM on a 64-bit Windows platform.Keep in mind that results may vary depending on hardware configuration and FME version.

RasterDEMGenerator

Workspace as a Template

Because surface modeling is such an intense process,using parallel processing can be very beneficial.

This example generates a DEM from a Point Cloud input:

RasterDEMGenerator Workspace

The RasterDEMGenerator group-by is set tofme_basenameto process each point cloud as its own group.

No parallelism: 1m10s
Minimal parallelism: 44s
Moderate parallelism: 33s
Aggressive parallelism: 37s
Extreme parallelism: 37s

Moderate is the best result here,over twice as fast as no parallelization.Minimal parallelism is slower because it does not use the full processing power.Aggressive and Extreme modes are slower because they are using full processing power at each others expense.

In a second test,the point cloud files are grouped according to the first character of their name (this is what the SubstringExtractor transformer is for).

No parallelism: 1m52s
Minimal parallelism: 1m 01s
Moderate parallelism: 42s

Using a larger test dataset shows the results do scale with data size:

No parallelism: 2h20m
Minimal parallelism: 54m

TINGenerator

Workspace as a Template

TINGenerator is another subset of SurfaceModeller similar to the RasterDEMGenerator.

A single TINGenerator in this example takes five minutes to produce a surface.However,even before parallelism,we can use a trick where one TINGenerator makes small surfaces (from each source LAS) with a second TINGenerator to combine these small surfaces into a single surface.

User-added image

Coupling this double surface generation with parallel processing gives excellent results:

Single TINGenerator: No parallelism: 5m01s
Two TINGenerators: No parallelism: 55s
Two TINGenerators: Minimal parallelism: 30s:
Two TINGenerators: Moderate parallelism: 28s
Two TINGenerators: Aggressive parallelism: 26s
Two TINGenerators: Extreme parallelism: 28s

Based on the results of the tests above we can decisively conclude that parallel processing allows faster surface modelling and can be recommended for the machines supporting multi-threading.

Buffering

Workspace as a Template

This example uses a Shapefile dataset containing major US roads where the intention is to buffer each road with a 25 m buffer.The process will wrap the Bufferer transformer inside a custom transformer so that the Group-By parameter can use a different attribute to the Parallel Process By parameter.

This is useful because it lets us create the optimal number of groups,here between 8 and 16.The ModuloCounter transformer is used to do this,its "Count Maximum" parameter being the number of groups created:

User-added image

The numbers for buffering of 450,000 original road segments are the following:

No parallelism: 2m51s
Moderate parallelism (4 groups): 1m29s
Moderate parallelism (8 groups): 1m30s
Moderate parallelism (16 groups): 1m33s
Moderate parallelism (24 groups): 1m36s
Moderate parallelism (50 groups): 1m54s

As we can see,the smaller sizes of the groups in the last test do not compensate the multiprocessing overhead (firing up FME sessions and sending features between FME instances).This would be less of an issue were each group to have a much larger number of features.

Line Joining

Parallel processing can be used on any transformer in a workspace.In the above example,parallel processing the LineJoiner transformer (wrapped up in a custom transformer) gives the following:

No parallelism: 1m11s
Moderate parallelism: 1m14s

We could conclude that parallel processing has no advantages over normal line joining.However,when the dataset is approximately 5 times as large,the results are quite different:

No parallelism: 10m 23s
Minimal parallelism: 6m 06s
Moderate parallelism: 5m11s
Aggressive parallelism: 5m09s

Without parallel processing,one single process can hog resources,paralyzing the computer while FME optimizes memory use and caches data to disk.

Clipping

Workspace as a Template

Clipping is another operation where multiprocessing can be beneficial.In this example,we read US major roads,that are already joined within states,and clip them to the county boundaries.Again,we use the second digit of the FIPS number for making processing groups:

User-added image

The results on a large dataset (~450,000 features) gives the following:

No parallelism: 1m 44s
Minimal parallelism: 1m 17s
Moderate parallelism: 1m 17s
Aggressive parallelism: 1m 18s

With an even larger dataset (~2,250,000 features) the results are even more marked:

No parallelism: 27m 01s
Moderate parallelism: 7m 33s

Point Cloud Manipulation: 3D Clipping

Workspace as a Template

Clipping point clouds in 3D can be useful for a simple surface filtering:

User-added image

The results on a relatively small point cloud look as follows:

No parallelism: 1m 45s
Aggressive: 1m 12s

More Information

For more information,please seeParallel Processing documentationor theParallel Processing section of the Desktop Advanced Training.

statscalcparallel.png (37.7 kB)

statscalcparallel2.png (10.3 kB)

Add comment

10 |4000 characters needed characters left characters exceeded

Attachments:Up to 10 attachments (including images) can be used with a maximum of 4.0 MB each and 4.0 MB total.

Contributors

Parallel Processing

Introduction

Notes:

Setting Up Parallel Processing

Parallel Processing Levels

Parallel Processing Tips

Parallel Processing and Performance

Parallel Processing and Custom Transformers

Parallel Processing Examples

RasterDEMGenerator

TINGenerator

Buffering

Line Joining

Clipping

Point Cloud Manipulation: 3D Clipping

More Information

Article

Follow this article

Navigation

Related Articles

Related Articles

How To Use Parallel Processing in FME

Converting Point Clouds to Surface Models Using the PointCloudLASClassifier

How to Read and Translate all Feature Classes from Multiple ESRI Geodatabases

In what order are features processed when there are parallel transformers

Maximum concurrent FME processes error

Creating Boundary and Point Features from a Point Cloud

Clipping and Tiling Point Cloud Data

Passing a Published Parameter to a Workspace from the Command Line