Hola amigos! How’s it going? All tickety-boo? There’s a new bit of FME for 2019 that’s dead brilliant, and if you ‘old your ‘osses I’ll tell youse about it.

Well that’s a unique opening line for a blog post! My usual style is quite formal and (I hope) a lot clearer. But I wanted to start out with more natural speech because today I’m covering Natural Language Processing (NLP); new functionality coming up in FME2019.

NLP是一种用于自然人语言的计算机学习技术。从技术上讲,自然语言是任何发展的常规语言。它不一定是俚语。但不是每个人都以正式的方式写入,因此您经常要处理包含不寻常的短语的文本。

NLP can even involve a computer generating human-like speech! But today I wanted to show one particular aspect of it: the ability to mine information and categorize it, based on prior examples.

让我们看看...

FME自然语言处理:方案

为了测试FME的新功能,我需要一个信息来源 - 在NLP中它被称为acorpus– and luckilyI found onein the form of product reviews. Each review is matched to a label to define whether it is a positive or negative review:

__label__1 Very disappointed!: This is just AWFUL! __label__2 Good book: Well written.

So label 1 means a negative review and label 2 means a positive review. I can use that to have FME learn what makes a review positive or negative, and then feed it unlabelled reviews for it to categorize for me. That’s often calledsentiment analysis......

FME自然语言处理:变压器

FME2019 has two new transformers: theNLPTrainernlpclassifier。NLPTrainer我s what I feed the labelled reviews, from which to build a model. The NLPClassifier is fed new reviews, and compares them to its review model to classify them as positive or negative.

因为这是相当新的,测试功能,because written words on a blog can be more permanent than might be expected, I thought I’d demonstrate using a video:

I forgot to mention a few items. Firstly that the output from the transformer also includes a summary feature, with information about the accuracy and key words used.

Secondly, that NLP is (mostly) language agnostic. It assumes an English-like sentence structure, but could work just as well on data stored in other languages. I do imagine you must carry out the training in the same language you are going to test against!

Finally, you can’t add to the model. You can overwrite it with new training, but not add to it. So you’d probably keep the original corpus, add to that, and recreate the model when necessary.

Anyway, hopefully the video helped you understand what I’m talking about (pun intended). But though NLP is interesting, what might FME users do with it?

FME Natural Language Processing: Examples

我总是喜欢给出一些可以使用新技术的例子。有时我的想法和想法无处可去,我没有提到他们。今天我要提到这些,帮助你远离我认为死亡的东西。

Data Classification and QA

Classification? Well… obviously. This is what the above video already shows. I think this is the most likely use in FME.

一个想法正在冒险和对他们进行分类。例如,我想知道我是否可以在闪电发生的情况下培训模型。然后我通过nlpclassifier运行新的预报,以查看今天的条件是否有利于闪电(此时我可以发出警告)。我看到了很多可能性。

It also made me wonder if NLP could support data QA. At first I though of an address database. If I train NLP on the difference between a good and bad address, might it help to pick up future problems as they happen? It might; but addresses are very structured and – as I understand it – NLP is all about unstructured, human speech. So although I haven’t tried it, I believe it’s better to stick to the standard transformers (Tester, AttributeValidator) for QA’ing structured data, and use NLP when the input is written sentences.

具有自然语言处理的分类和QA'ing数据通过提高输入的质量来提高输出的相关性。但如果NLP分析怎么办?输出…

Business Intelligence Products

Have you ever thought about creating BI products with FME? You wouldn’t be the first! In fact a prior blog post featured a partner (setld)doing just that:

Setld,FME和4 VS大数据:建筑商业智能“产品”亚搏在线娱乐平台

One key sentence in that article says data is evaluated“against a word value lookup table (that setld maintains) in order to rank the top 100 news pieces”.

我不会声称他们的完整方法,但对我来说,他们维护的查找表相当于FME现在可以构建的NLP模型。虽然它可能不是1:1更换,但这些新变压器可能能够自动化其一些查找表维护。

Basically making a product from NLP output is a real possibility. But it can also help internal processes…

Marketing

The Safe Software marketing team must have triggers to report on new FME-related content. ButGoogle Alerts– as far as I can tell – are just keyword searches:

Yeah… sorry Google, but that’s not the right FME. Of course, that’s understandable since their alerts aren’t trained to our needs. But why shouldn’t our marketing team create an NLP model and run future alerts through the NLPClassifier, to filter out the ones that aren’t the FME we are interested in? If you work at a company with a marketing team, you could help them out by doing the same.

到目前为止,我提到的NLP示例都是非空间的。那么我们可以将地理纳入NLP ...

空间NLP.

Let’s say you were mapping Twitter alerts about natural disasters. NLP could assess how relevant a tweet is, before adding its information to your map. For example, I guess a suitably trained model would be able to tell the difference between “Help! My house is on!” and “Yikes! My boss is going tome!” Basically you add a layer of filtering before the data gets onto your map, by teaching your computer to assess the context of the word “fire” in the tweet.

Interestingly – asthis articlementions – you might also analyze language for hints about location. For example, given the tweet: “Tornado in Springfield! North of the Cottonwood River” NLP could be able to identify “Springfield” and “Cottonwood River” as being place names (I believe that’s calledNamed Entity Recognition)。

Of course there are many Springfields in the US, but a well-trained model might even be able to tell which Springfield it is by reference to the Cottonwood River.

但为什么nlp?为什么不管怎样?因为我们在谈论自动化系统。是的,人类可以解释这些消息,但不是在速度下,而不是速度,而不是自动。但是用NLP,FME服务器可以!

What I really wondered is whether spatial data itself could be used as the input! For example if I train an NLP model using point features labelled with coordinate system, could I get the NLPClassifier to identify the coordinate system of unlabelled data?! Probably not. That again would be structured data, plus I think NLP only works with words, not numbers. But it’s fun to let the imagination run wild sometimes!

FME自然语言处理:摘要

So that was a rough guide to upcoming Natural Language Processing functionality in FME2019.

一般来说,我们可以说很多FME使用需要原始数据并从中获取有用的信息;无论是翻译格式,重组数据,还是过滤内容。当你看那种方式,真的是关于商业智能的全部。即使是空间数据和映射也是关于让合适的人员获得正确的信息,以便做出更好的业务决策。

NLP可以很大帮助。

到目前为止,我几乎没有想到了NLP模型需要什么,或者一些变压器参数所做的;所以你应该把我的建议作为一般思想,而不是明确的规则。

我希望我给你的是一个基本的理解,之后你会发现它更容易试验。

顺便提一下,如果您一直观看视频到底,您如何看待FME包文件?很酷,呃?这将是FME如何交付和更新的巨大发展。我认为它实际上可能对2019年计划的所有更新的影响最大。

我不知道NLP是否完全进入the latest beta那because of how it’s packaged, but if you want to try it out, then get in touch. The same applies if you have any general questions. As we might say in theEast Midlandsbungem ovva ear me duck!

About FME Fme 2019 FME福音师 Machine Learning 自然语言处理 NLP nlpclassifier NLPTrainer

Mark Ireland

Mark,Aka Imark,是FME Evangelist(EST.2004),并对FME培训有热情。他喜欢能够以新的和有趣的方式帮助人们理解和使用技术。他的其他一个激情是足球(又名。足球)。他非常喜欢技术和足球,以至于他在一起写了一篇关于这两者的文章!谁会想到?(答案:imark)

Comments

5 Responses to “FME 2019 Sneak Peek: Machine Learning and Natural Language Processing”

  1. Takashi Iijima says:

    Sounds it’s awesome. I’d like to know if the NLP in FME 2019 would be internationalized or support English only.

    • Mark Ireland says:

      I’m told it should work with any language – though I haven’t tested it. I think the biggest challenge will be identifying individual words. English is simple because it has a space between words, but I believe many languages don’t do that. Also for Japanese it might help to transliterate the content first. That’s just a guess but I’m not sure how well it will work with non-Latin characters.
      Good luck! If you try this please let us know how it works out.

  2. kim says:

    What’s coming in 2012? You must be showing your age!
    我被列入NLP从所有垃圾中解析出地址标签的所有垃圾的地址。它看起来与正则表达式有关。也许如果我加载正确的地址数据库,它将从(无与伦比)的交付说明和冗余议会名称中删除地址?

    • Mark Ireland says:

      Yes, I’m getting on a bit. But at least when I’m sitting in my chair dribbling, a computer somewhere will be able to use NLP to understand what I’m saying! As for addresses – I’m not sure. Yes, it might be able to get some info out of delivery instructions; but on the other hand addresses are usually structured and NLP is more meant for unstructured content. But don’t let me put you off. I’d be very interested to see what you can do with this (it would make a great World Tour presentation)!

  3. [...] FME的新功能,即在人工智能和机器学习的真正前沿。NLP使用计算机来处理和分析大量人类自然语言数据,无论语言还是标准化水平。阅读关于FME用户的新功能及其潜在用途,在Mark Ireland的意识博客文章中。[...]

答复AnonymousCancel reply

您的电子邮件地址不会被公开。Required fields are marked*

相关文章