Tutorial: Getting Started with PDF Reading

Articleby natalieat亚搏在线safe· Apr 06,2018 at 04:57 PM· edited· Apr 12,2018 at 04:31 PM

Article created with FME Desktop 2018.0

Introduction

FME'sAdobe Geospatial PDF Readercan extract much information from PDF documents.Imagery,rasters,vector data,text,spatial information and attributes can be read.

However,extracting information from a PDF document can be complex.One of the complications with PDF is that it is a document format.PDF document contents can vary greatly: you may have much information spread over many pages,or maps (basically an embedded picture),or maybe it contains a CAD drawing with many lines all over the place.So it's hard to know how to read the PDF document before seeing it and knowing what you need to extract from it.Sometimes you may be concerned about where information is on the page of the PDF,other times you may simply want to extract the content,the location doesn't matter.

A PDF document in FME Data Inspector (left);the same PDF document in Adobe PDF Reader (right)

PDF Reader Options

The PDF Reader has many options for extracting data.Your PDF may contain:

Vector or Raster map data
Pages and pages of Text
Headers,Footers,Tables and more

The main choice is about whether to read the PDF as spatial or non-spatial (tabular).In other words,does the location of each feature on the page matter,or are you simply concerned about the page as a whole.Additionally,it is possible to select both Spatial and Non-Spatial (tabular) PDF Reader options at the same time.

Spatial Parameters

Detailed information about the Spatial parameter options can be found in thehelp documentation.

The Spatial section refers to the fact that the PDF document may contain information which has some sort of particular location on the page,which may translate to a specific location on the earth if there is a coordinate system or coordinate systems defined for the PDF document.PDF documents can contain multiple coordinate systems per page.

If you would like to display PDF data in the Data Inspector with a Background map,it is necessary to set Coordinate Units to Geospatial (if possible).It's only possible to display PDF data with a background map in Data Inspector if a coordinate system exists.

Non-Spatial Parameters

Detailed information about the Non-Spatial parameter options can be found in thehelp documentation.

If your PDF document contains tabular data,it is possible to extract metadata,text and even rasterize the entire PDF page.The Non-Spatial Metadata parameter can be useful to extract information such as attributes,or information about the document including creation date.

PDF Reading Articles

Reading Simple PDF

This article covers how to read a simple PDF which contains a title,a couple of maps,some text,and a table.

Reading PDF Map Frame Content

Learn how to inspect and extract the content of PDF map frames.

More PDF reading articles are in progress and coming soon!

pdfcomparisoncropped.png (803.8 kB)

Add comment · Show 3

10 |4000 characters needed characters left characters exceeded

Attachments:Up to 10 attachments (including images) can be used with a maximum of 4.0 MB each and 4.0 MB total.

ciarab ·May 03,2018 at 02:35 PM 0

@NatalieAt亚搏在线Safehow easy is it to rasterize the entire PDF?I was just trying out the PDF reader and I can read in my PDF content which is great but I want to write this content to word document ( map with a frame template).This would require it to be a raster to write to word,I am not having much success transforming my PDF into a raster format.

nathanat亚搏在线safe ciarab ·May 08,2018 at 10:04 PM 0

Hi @ciarab,
I haven't completed this kind of 亚搏在线workflow before,but I think you could take inspiration from our Word writingand MapnikRasterizertutorials.Note that you may need to make use of transformers like the TextStroker to manipulate text or other features before rasterizing.This might be a good question for our forumas perhaps other uses can fill in the gaps in your 亚搏在线workflow design.
Hope this helps!
Nathan

dmitribagh ·May 09,2018 at 05:12 AM 1

Hi @ciarab,

I just tried reading geospatial PDF,rasterizing the vector data with MapnikRasterizer,then passed the image through MSWordStyler,and saved as MS Word document - works well.The only part that requires some scrupulous set up is Mapnik - you should do it for every layer you need to show on the raster,and often,you need to do it two-three times (for example,set a polygon with polygon symbolizer,then the same polygon with line symbolizer to show its boundary,and finally,the same polygon with the text symbolizer to make a label).

If you have any questions about using MapnikRasterizer,feel free to ask me.
Dmitri

7People are following this .

Tutorial: Getting Started with PDF Reading

Introduction

PDF Reader Options

PDF Reading Articles

Article

Follow this article

Navigation

Related Articles

Related Articles

What Happened to my PDF Map Features?

Creating Charts with FME | ChartGenerator

Extract Layers from Traits and Swap Appearances

Generating 2.5D building models from 3D data

Using the CSGBuilder to construct 3D solids

Adobe Reader settings causes degraded rendering of 3d models and textures

Creating PDF Cartographic Output (2012)