span8
span4
span8
span4
I am a new FME user and am trying to extract a text information from a PDF.Reading this fórum I came up with this:
Using attribute filter I choose the page number I need to extract that information and the inspector showed me this:
The yellow text is the information I need to extract.How do I do extract that to a excel file or csv?
The PDF reader has a parameter (under Non-Spatial > Read Tagged Tables) which controls readingtagged tablesas a feature type.If a tagged table is present in your PDF,features will be output from thepdf_tablefeature type.
You may want to confirm whether or not your input dataset contains tagged tables as using this parameter would be the easiest way of extracting the information.You may also want to try decompressing your PDF file assuggested herebefore reading as this allows the PDF reader to read tagged tables from certain datasets.
As an aside,if you want to see the PDF reader support reading non-tagged tables,please feel free to vote onthis Idea
If none of the above suggestions work for you,one workaround would be relating the insertion points of text features to a table cell and extracting the text strings which fall within the cell areas.I have attached an example workspace demonstrating this 亚搏在线workflow here:getTextFromPDFTable.fmwt
One option that you could try would be to use the `Non-Spatial > Read Non-Spatial Text` mode.
This produces all of the text that can be found for each page,and it may be easier to extract the information you're looking for from that output.
In your case I would expect the feature text to contain lines like:
"X error (cm) Y error (cm) Z error (cm) XY error (cm) Total error (cm)"
"0.275764 0.699132 4.04833 0.751553 4.11799"
You could use anAttributeSplitterto split the lines,and another one to split each line by whitespace.
@danilo_fme,thank you very much.I am gonna check on that.Once I am a new user,I´m still strugling with FME.:)
© 2019 亚搏在线Safe Software Inc |Legal