Click here to Skip to main content
15,881,715 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi Experts,

I want to read plain table format data from large PDF file , which contains spaces between columns instead of table lines.i am able to read the pdf file as line by line using Itextsharper , but we are not able to find it as a table format.

below is the example


Column1 Column2 Column3 Column4
Item 1 data 1 data1 data1
Item 2 data 2 data2 data2

we need to export this data to xml format, please help us any one.

What I have tried:

i am able to read the pdf file as line by line using Itextsharper , but we are not able to find it as a table format.
Posted
Comments
wizardzz 1-Mar-16 10:00am    
Are you able to try regex to split the columns? Is the data space or tab delimited when you read it to a string?
Doomplast 3-Mar-16 8:50am    
You could take a look at this word processing library for C#, from what I can see it's able to read a PDF file and extract its text in C#. Also this sample demonstrates exactly what you need (the table's extraction).

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900