Click here to Skip to main content
15,885,182 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
The .pdf files have boxes for firstname, lastname, etc
if possible, please let me know the c# code which allows me to read the text inside the box for these sections such as firstname, lastname, etc.
I have looked into pdfsharp and can now read the size, author, etc but not sure how to read the sections I described above.
Any thoughts please?
Thanks
Posted
Comments
ZurdoDev 28-Feb-12 12:21pm    
Use a 3rd party tool and then read its documentation.
Mauro Gagna 28-Feb-12 17:12pm    
You can use ITextSharp to do that. If you can upload an example PDF file we can help you more easily.
arkiboys 29-Feb-12 4:53am    
Yes, I am using itextsharp.
I am now trying to see how to read a text from inside the .pdf file...
I do not know how to refer to the area where there is the firstname but not sure how to pickup the firstname text.
Any thoughts please?
Thanks
Mauro Gagna 2-Mar-12 10:07am    
You can search the text what you want by its content (for example if you know the row start with "Name:") or for its position (if you know where the text starts or in where area is).

If you can provide some info about your case, and specially if you can share a PDF file for example, we can give you some code for help to do it.
arkiboys 5-Mar-12 9:48am    
Hi, I would like to do the following for a .pdf file:
1- Open the .pdf file
2- Look in the second page and read the text inside txtLastname
3- Read value in radio option called rdoChecked.

Can you assis please?
Thanks

1 solution

This is an example how to get all the text of the PDF File.

C#
public string ReadPdfFile(string filename)
{
	PdfReader pdfReader = new PdfReader(filename);
	string fullText = string.Empty;

	for (int nPage = 1; nPage <= pdfReader.NumberOfPages; page++)
	{
		ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
		PdfReader reader2 = new PdfReader(filename);
		String s = PdfTextExtractor.GetTextFromPage(reader2, nPage, its);

		s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
		strText = strText + s;
		reader.Close();
	}
  
  return strText;
}


For archive what do you want to do, you have to crete your own Extraction Strategy. Once I make mine to get the text by its position based on the LocationTextExtractionStrategy (is in the ITextSharp source code).
You should create you own TextExtractionStrategy with your conditions.

I hope this was useful.

Mauro.
 
Share this answer
 
Comments
arkiboys 6-Mar-12 6:06am    
Hi, thanks for the code but it just reads the text printed in the .pdf form.
But I am interested to get the text inside the controls. For example, there is a txtcontrol called "txtLastname" inside page 2 where the user has entered his name in it. I would like to retrieve the person's name. How is this done please? Thanks
Mauro Gagna 6-Mar-12 10:07am    
I don't have any PDF file with a textbox inside to try it. Can you upload an example file to any file shareing website and put the link to download it. If you can give me an example file I can tell you how to read it.
arkiboys 7-Mar-12 3:16am    
I'm afraid the only one I have is the company one which I can not upload and do not load how to create a new one to test for you.
Sorry.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900