Read table data from pdf

0.00/5 (No votes)

See more:

Hi Experts,

I want to read plain table format data from large PDF file , which contains spaces between columns instead of table lines.i am able to read the pdf file as line by line using Itextsharper , but we are not able to find it as a table format.

below is the example

Column1 Column2 Column3 Column4
Item 1 data 1 data1 data1
Item 2 data 2 data2 data2

we need to export this data to xml format, please help us any one.

What I have tried:

i am able to read the pdf file as line by line using Itextsharper , but we are not able to find it as a table format.

Posted 29-Feb-16 19:29pm

Member 12075213

Add a Solution

Comments

wizardzz 1-Mar-16 10:00am

Are you able to try regex to split the columns? Is the data space or tab delimited when you read it to a string?

Doomplast 3-Mar-16 8:50am

You could take a look at this word processing library for C#, from what I can see it's able to read a PDF file and extract its text in C#. Also this sample demonstrates exactly what you need (the table's extraction).

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Last 24hrs

This month

Richard MacCutchan	120
Pete O'Hanlon	100
OriginalGriff	60
Richard Deeming	55
Rick York	30

Pete O'Hanlon	1,940
OriginalGriff	1,415
Graeme_Grant	895
Richard Deeming	853
Dave Kreskowiak	694