Click here to Skip to main content
15,895,142 members
Home / Discussions / C#
   

C#

 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
leckey28-Oct-08 11:07
leckey28-Oct-08 11:07 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
Paul Conrad28-Oct-08 11:09
professionalPaul Conrad28-Oct-08 11:09 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
leckey28-Oct-08 11:10
leckey28-Oct-08 11:10 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
Paul Conrad28-Oct-08 11:14
professionalPaul Conrad28-Oct-08 11:14 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
vinoo8028-Oct-08 11:17
vinoo8028-Oct-08 11:17 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
Paul Conrad28-Oct-08 11:21
professionalPaul Conrad28-Oct-08 11:21 
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
Furty29-Oct-08 0:48
Furty29-Oct-08 0:48 
AnswerRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
Kythen29-Oct-08 10:40
Kythen29-Oct-08 10:40 
I don't think the text encoding is your problem. Based on a quick Google search, it looks like GetPageContent doesn't do text extraction for you. It just returns the uncompressed operator stream. You will need to get cozy with the PDF file format and parse those operators to extract the text from the operators. You will also need to use heuristics to figure out how to put the text back together, because text operators don't necessarily appear in the pdf file in the same order as they get displayed. Even then it may not be possible to accurately extract the text.

Here's an example of how you'd miss the text given the method you're using now. Searching for "Test" with the following operators would fail:
(T) Tj
(e) Tj
(s) Tj
(t) Tj


And here's an example of where you'd probably never find the text no matter what you do:
1 0 0 1 100 0 Tm
[(t) -10 (s) -10 (e) -10 (T)](TJ


These operators display "Test", but the text you'd likely extract is "tseT".

And don't forget to parse the form resources as well. Some pdf file creators like hiding text in forms. And by forms I don't mean forms that you fill out. See the PDF spec for info on form resources.

PS: In the future, don't bother saying your question is "Urgent". No one cares, and it's more likely to have your question ignored. I replied because it was a reasonable question and you showed that you at least made a little effort to figure it out yourself.
GeneralRe: URGENT : Help with parsing the PDF generated by Crystal reports-V9 Pin
vinoo8029-Oct-08 15:40
vinoo8029-Oct-08 15:40 
QuestionProblem saving changes to a database VS C# 2005 Pin
danielhasdibs28-Oct-08 8:18
danielhasdibs28-Oct-08 8:18 
AnswerRe: Problem saving changes to a database VS C# 2005 Pin
Rajasekharan Vengalil28-Oct-08 8:28
Rajasekharan Vengalil28-Oct-08 8:28 
AnswerRe: Problem saving changes to a database VS C# 2005 Pin
Paul Conrad28-Oct-08 11:08
professionalPaul Conrad28-Oct-08 11:08 
GeneralRe: Problem saving changes to a database VS C# 2005 Pin
danielhasdibs29-Oct-08 4:03
danielhasdibs29-Oct-08 4:03 
QuestionRe: Problem saving changes to a database VS C# 2005 Pin
danielhasdibs29-Oct-08 3:41
danielhasdibs29-Oct-08 3:41 
AnswerRe: Problem saving changes to a database VS C# 2005 Pin
danielhasdibs29-Oct-08 9:34
danielhasdibs29-Oct-08 9:34 
AnswerRe: Problem saving changes to a database VS C# 2005 Pin
nelsonpaixao29-Oct-08 15:19
nelsonpaixao29-Oct-08 15:19 
QuestionSQL Timeout bypass Pin
BradAW28-Oct-08 7:47
BradAW28-Oct-08 7:47 
AnswerRe: SQL Timeout bypass [modified] Pin
PIEBALDconsult28-Oct-08 8:12
mvePIEBALDconsult28-Oct-08 8:12 
QuestionHow to access Variant object returned from COM method in C# Pin
Larry K28-Oct-08 6:29
Larry K28-Oct-08 6:29 
AnswerRe: How to access Variant object returned from COM method in C# Pin
Larry K28-Oct-08 8:37
Larry K28-Oct-08 8:37 
QuestionCall default dial up connection Pin
Hossein Afyuoni28-Oct-08 6:24
Hossein Afyuoni28-Oct-08 6:24 
QuestionUser privilege issue Pin
George_George28-Oct-08 3:53
George_George28-Oct-08 3:53 
AnswerRe: User privilege issue Pin
Mark Salsbery28-Oct-08 4:49
Mark Salsbery28-Oct-08 4:49 
GeneralRe: User privilege issue Pin
George_George28-Oct-08 4:54
George_George28-Oct-08 4:54 
QuestionRe: User privilege issue Pin
Mark Salsbery28-Oct-08 5:41
Mark Salsbery28-Oct-08 5:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.