Click here to Skip to main content
15,885,365 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
i want to get specific text from pdf file to excel using c#.
Posted
Comments
Zoltán Zörgő 28-Feb-15 4:24am    
You can use OCR for that. Actually only OCR.

There are libraries you can use to get content from PDF as from a document. But it is not as straightforward as it looks. You will not be able to read encrypted and obfuscated PDF files. Even more, there is no guarantee that the PDF content you are looking for is actually text and not raster image or vector graphics. On the other hand, even if you extract the text somehow, it will be hard to match your ROI.
So, if you don't use OCR, you will end up with a solution that it is working only in specific situations, but no general one for sure.
There are some OCR engines you could use for free, like Tesseract[^]. But as it has no native PDF support[^], you will need some pre-processing.
So I suggest you look for a good but not expensive commertial solution, like this one: http://www.abbyy.com/ocr_sdk_windows/[^] (Abbyy is really great).
On the other hand you could try Adobe PDF IFilter[^] with C# (Using IFilter in C#[^]).
In newer windows version there is an OCR engine[^] which could be used, but I have no further knowlede about it's capabilities.
 
Share this answer
 
v2
Comments
Manoj Sawant 5-Mar-15 2:34am    
i have used OCR but i m not able to retrive the text from the image which is in the pdf file. This is the output what i get,

Return To:
CT LIEN s (rest of the text is trimmed due to evaluationrestriction!)
Zoltán Zörgő 5-Mar-15 2:47am    
There you have it: "rest of the text is trimmed due to evaluation restriction!" I don't know what exactly are you using but is is quite clear, that it is not properly licensed. What you are using is an evaluation version you will need to buy it.
You will need to use a library such as iTextSharp[^] to read the content.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900