This article is a simple guide to bulk database updates with FME.

有时我的博客文章就像一部法庭小说。最近精确文章is an example of this. I reach a successful verdict, but getting there involves an opening statement, a journey through obscure aspects of theory, cross-examining our functionality, and making a final argument for various design decisions. Only after all that can I prove beyond doubt that our work is fit for purpose.

That sort of article is fun to write and it does illustrate Safe’s thought processes in designing FME functionality. But sometimes – like reading a courtroom novel – you want to skip to the final chapter. You just want the juicy part where our hero attorney has already got the result, and simply illuminates the solution in logical steps.

Well, today, I am that hero attorney. I am thePerry MasonFME世界和本文是“使用FME更新数据库”的最后一章。无需理论或盘问。一步一步,我将阐明FME如何轻松将质量更新推向数据库。

There is just one design issue I’ll touch on, and it is important. In law there are multiple jurisdictions, and a technique Perry Mason tries in a California court may not work forAtticus Finchin Alabama. Similarly, in our world of data there are multiple databases. But! FME techniques you use for Oraclewillwork for SQL Server, and FME techniques you use for Postgreswillwork for an Esri Geodatabase.

How can this happen? It’s because we’ve worked really hard to harmonize (or standardize) the interfaces inside FME. So whatever database applies in your jurisdiction – or even if your work involves several database types – this post covers you.

So let’s get on with it…

Setting up an FME Database Connection

When you want to read to or write from a database – including database updates – you need authorization. FME defines authorization parameters using a connection tool. It’s accessed through Tools > FME Options in FME Workbench. So start Workbench, select Tools > FME Options and you will get this dialog:

When I click on Database Connections, I get a list of available connections. If I want to create a new one I press the plus button and get this dialog:

Just enter your database details in there, test and then save. Now you have a connection defined, you can use it wherever you like in FME. So first let’s use it to insert some data into a database…

Inserting Data to a Database with FME

To carry out database updates, first you need data in the database! Let’s say you want to read a dataset – maybe an Excel spreadsheet – and write it to a database, creating a table at the same time. That’s simply done in FME by generating a new workspace and choosing a database format. The generate option exists on the start page of FME Workbench, or you can use the shortcut Ctrl+G. That opens the basic dialog for defining a translation:

Here I have filled in the fields to define a translation of data about parks in the city of Vancouver. The translation is from MapInfo TAB (a spatial data format) to PostGIS (a standard PostgreSQL database with a spatial extension). The spatial part is not necessary – the same setup works for data without a spatial component – but maps are cool so I’ll go with it.

我选择的数据库连接是我上面定义的一个连接,为我节省了重新输入身份验证详细信息的努力。

Click OK and that dialog creates a workspace that looks like this:

左侧的每个对象都是源数据中的表,图层或类。右侧的每个对象都是数据库中的一个表。

设置数据库参数

When using databases the key settings for each table are accessed by clicking the cogwheel icon on those objects, like this:

Database Insert Parameters

我的表名s the first parameter, and so I can rename the table to be something different; and I can choose which schema (Table Qualifier) to write to as well.

But the most important parameter (Feature Operation) tells us that we are INSERTing data, and I can also choose in what way the table is created:

So I can choose to create the table regardless of if it exists (Drop and Create), create it if it doesn’t already exist (Create if Needed), just add to the existing table (Use Existing), or empty it if it already exists (Truncate Existing).

我使用取决于场景我不工作吗hrough, but in this case – to create and fill a table – I’ll use如果需要,请创建.The advantage over Drop and Create is that if another user already has a table with that name (and I haven’t checked) then at least I won’t delete their content first.

Anyway, I run the workspace and FME loads the data:

当然,在将来的某个时候,我可能会发现数据的来源(parks.tab)已更改,我需要根据更改的数据集更新数据…

Updating Records in a Database with FME

Say I receive a dataset named ParksUpdates.tab. The simplest way to update my database is to do the same process as above, but to use Drop and Create as the table operation. That way I am just replacing everything:

But that relies on ParksUpdates.tab being the FULL replacement dataset. What if it only includes the records that need an update? Well in that scenario I simply change the operation to UPDATE (instead of INSERT) and choose to Use Existing table:

数据库更新参数

Notice that when I pick UPDATE, then another parameter becomes available to me: Match Columns. I need this to define which feature updates which record. In this case I have an attribute called parkid in my source data and a field (column) named parkid in my database table; so that’s the attribute I select:

因此,如果传入功能具有属性parkid = 13,则使用其内容来更新parkID = 13的数据库记录。

That’s simple enough, but to add a little complexity (not too much) there is also a WHERE clause I can use instead. This lets me define the match where the attribute and field names are not the same (for example ParkNumber and parkid) but also allows me to add extra conditions using field names:

Here, for example, I’m updating records where ParkNumber = parkid, but also only where the neighborhoodname field is “Downtown”. So records outside of the Downtown area aren’t updated, even if the park ID matches. I could do a similar test for a status field (active, inactive) among many other examples.

So we do updates here, and that’s simple enough; but sometimes we also want to delete records…

Deleting Records from a Database with FME

假设我的UpdatedParks数据集是从数据库表中删除的记录列表,而不是添加。为此,我只是将操作从Update更改为DELETE:

Database Delete Parameters

我获得了相同的匹配列参数(或哪个子句)来定义哪些传入功能应删除哪些现有记录,这再次易于定义。

So deletes are no more complex than updates; the key question is what happens when I want to both delete and update records simultaneously?

Updating AND Deleting Database Records with FME

假设一些传入的记录是更新,而其他记录是删除:

Obviously I can’t set the operation parameter to both DELETEand更新整个表。我做的是标记每个功能with the operation it will carry out. I do this using an attribute calledfme_db_operation

Database Updates AND Deletes

You can see here that I have added an attribute to each stream of data, using an AttributeCreator transformer. The attribute name isfme_db_operation.对于许多数据,我设置了要更新的值。另一组数据具有删除值。这就是我用自己的操作标记每个功能的方式。

我仍然必须在表本身上设置操作类型。但是这次,我选择了标记的选项,而不是选择插入,更新或删除fme_db_operation

现在,当我运行工作空间时,特征标记的更新数据库记录,而标记为删除删除数据库记录的功能。匹配列(或子句)提供了功能和记录之间的匹配。

The one assumption is that we already know which feature are deletes and which are updates. In the above example, the source data is already divided into two. If we aren’t sure of that then we might need to do what is calledChange Detection

使用FME更改检测和数据库更新

变化检测是我们有一个新的数据集的地方d want to compare it to existing records to find what has changed. Here is such a workspace:

它看起来很简单,实际上。我添加了一个阅读器(读者>添加读者)来读取数据库表的现有内容,并读取一个更新轨道变压器,以将这些记录与更新的parks数据集进行比较,以确定发生更改的地方。我可以检测到字段值或空间内容或两者的更改。

Then it’s just a case of writing the results back to the database table. I don’t even need to create thefme_db_operationattribute; the UpdateDetector has done that for me. I must just check that the table is set with the correct operation (fme_db_operation)and that Match Column is set.

At this point you probably know more than enough to carry out database updates; but there is one more scenario I can perhaps mention. What if each feature has a different match column? In that case you can write your match in the form of a where clause, and store it as an attribute. Then use that attribute for the match in the table parameters:

…just like that!

NOTE:In FME2019the ChangeDetector has undergone various improvementsand should be your go-to transformer instead of the UpdateDetector.

包起来

I hope that was a good explanation for database updates, short on theory and long on practical examples. Speaking of which, we have an有关数据库更新的在线教程, where you can carry out exercises, step by step, using similar examples to what I showed today. So if you want to try some of these techniques in a safe practice environment, click the above link to visit the tutorial.

As I mentioned, although I used Postgres here, most of our database formats use an identical interface because we’ve invested a lot of time standardizing them all. That work is still ongoing, so one of the tutorial articles includes a标准化格式列表, and what to do if your format is not yet updated.

And now, if you’ll excuse me, I’m going to do like Perry Mason and wrap up my case by going out to celebrate with steak and cocktails!

PS:亚搏在线安全软件现在有一个Instagram page.我不知道Instagramis, but here’s a picture of me presenting at a prior user conference(oh, and riding an inflatable horse):

https://www.instagram.com/safesoftware/

关于FME Databases FME Desktop FME传教士 Oracle PostGIS Postgresql Spatial Databases SQL Server

Mark Ireland

Mark,又名Imark是FME福音传教士(Est。2004),对FME培训充满热情。他喜欢能够以新颖有趣的方式帮助人们理解和使用技术。他的其他激情之一是足球(又名足球)。他非常喜欢技术和足球,以至于他一起写了一篇有关这两者的文章!谁会想到?(答案:伊姆克)

Comments

4 Responses to “A Beginner’s Guide to… Bulk Database Updates with FME”

  1. 玛塔 says:

    Hi Mark,
    I am training my Database skills based on your article, but I have a problem with matching features in Update Detector, could you explain how to use Key Attribute, Attributes to Match? How the transformer knows which features we want to update, and which to delete?
    谢谢你的时间,
    玛塔

    • Mark Ireland says:

      Hi Marta,
      There are two versions of the UpdateDetector. I’m going to assume this is the custom/hub transformer in 2018.1, not the new one in 2019.
      2018年的更新器就像更换eTector。它可以采用两组功能,并寻找相同的功能,或者是新的或已删除的功能。它通过比较几何和/或属性来做到这一点。但是,它还具有选择ID号的参数。这是将原始功能与修订的功能相匹配。
      The transformer knows which is which by comparing features with the same ID. If an Original record has no matching ID in Revised, then the original record must have been deleted. If a Revised record has no match in Original, then it must have been added. If there is a match then the two records are compared to see if changes have occurred.
      For example, say I have a database with 10 address records. They have ID numbers of 1 to 10. Later I am given a spreadsheet with address updates in. It has records with ID numbers of 1,2,3,4,6,7,8,9,11,12
      在“更新材料”上,您将数据库连接到原始数据,并将Excel数据连接到修订。您选择ID作为关键属性。UpdatedEtector将输出两个记录,如删除(5,10)和两个记录(11,12)。其他记录将根据其内容是否已更改而更新或不变。说记录3,6,7已通过某种方式进行了编辑。然后更新了三个记录(3,6,7),五个记录不变(1,2,4,8,9)。
      这有帮助吗?简而言之,它取决于您的功能/记录具有ID号 - 最好是独特的ID。然后可以将其与具有匹配ID的功能进行比较以查找更改。
      希望这个对你有帮助
      Mark

  2. 玛塔 says:

    This is very useful, thank you Mark.

  3. Cloudi5 says:

    I really do not have any idea about what is FME , but when I read this article, I have a strong idea about it, and it is a step ahead of Beginner’s guide. Enriched with lot of resources about establishing connection !!

Leave a Reply

相关文章