read firstname information in pdf file

Question

0.00/5 (No votes)

See more:

The .pdf files have boxes for firstname, lastname, etc
if possible, please let me know the c# code which allows me to read the text inside the box for these sections such as firstname, lastname, etc.
I have looked into pdfsharp and can now read the size, author, etc but not sure how to read the sections I described above.
Any thoughts please?
Thanks

Posted 28-Feb-12 6:10am

arkiboys

Add a Solution

Comments

ZurdoDev 28-Feb-12 12:21pm

Use a 3rd party tool and then read its documentation.

Mauro Gagna 28-Feb-12 17:12pm

You can use ITextSharp to do that. If you can upload an example PDF file we can help you more easily.

arkiboys 29-Feb-12 4:53am

Yes, I am using itextsharp.
I am now trying to see how to read a text from inside the .pdf file...
I do not know how to refer to the area where there is the firstname but not sure how to pickup the firstname text.
Any thoughts please?
Thanks

Mauro Gagna 2-Mar-12 10:07am

You can search the text what you want by its content (for example if you know the row start with "Name:") or for its position (if you know where the text starts or in where area is).

If you can provide some info about your case, and specially if you can share a PDF file for example, we can give you some code for help to do it.

arkiboys 5-Mar-12 9:48am

Hi, I would like to do the following for a .pdf file:
1- Open the .pdf file
2- Look in the second page and read the text inside txtLastname
3- Read value in radio option called rdoChecked.

Can you assis please?
Thanks

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Mauro Gagna · Answer 1 · 2012-03-05T06:04:00

This is an example how to get all the text of the PDF File.

C#

public string ReadPdfFile(string filename)
{
	PdfReader pdfReader = new PdfReader(filename);
	string fullText = string.Empty;

	for (int nPage = 1; nPage <= pdfReader.NumberOfPages; page++)
	{
		ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
		PdfReader reader2 = new PdfReader(filename);
		String s = PdfTextExtractor.GetTextFromPage(reader2, nPage, its);

		s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
		strText = strText + s;
		reader.Close();
	}
  
  return strText;
}

For archive what do you want to do, you have to crete your own Extraction Strategy. Once I make mine to get the text by its position based on the LocationTextExtractionStrategy (is in the ITextSharp source code).
You should create you own TextExtractionStrategy with your conditions.

I hope this was useful.

Mauro.