Click here to Skip to main content
15,894,410 members
Articles / Programming Languages / Visual Basic

PDF Parser and FlateDecoder

Rate me:
Please Sign up or sign in to vote.
3.54/5 (8 votes)
23 Jul 2009MPL1 min read 81.3K   3.5K   24   16
Demonstrates how to parse objects in a PDF and inflate FlateDecode sections.

Introduction

Here is full code from start to finish on how to extract streams from a PDF file and inflate FlateDecode sections. SharpZipLib source is also included so everything will run right out of the box.

Image 1

Using the code

In the attached code is a small project with one file that shows how to operate everything. Below is the code in the OpenDocument event so you can see how simple it is.

VB
Dim ofd As New OpenFileDialog()
Dim ow() As PDFParser.ObjectWrapper
Dim sb As New System.Text.StringBuilder()

ofd.Filter = "PDF|*.pdf"
ofd.InitialDirectory = _
  System.Environment.GetEnvironmentVariable("%USERPROFILE%") + "\Desktop"

If ofd.ShowDialog() = Windows.Forms.DialogResult.OK Then
    ow = PDFParser.Objects.GetAllObjectBlobs( _
            New System.IO.MemoryStream( _
            System.IO.File.ReadAllBytes(ofd.FileName)))
For Each wrapper As PDFParser.ObjectWrapper In ow
    sb.Append("********************" + wrapper.header + _
              "**************************" + vbCrLf)
    If wrapper.header.Contains("FlateDecode") AndAlso Not _
           wrapper.header.Contains("DecodeParms") Then
       Try
        sb.Append(PDFParser.Inflator.FlateDecodeToASCII(New _
                  System.IO.MemoryStream(wrapper.bytes)))
       Catch ex As Exception
        sb.Append("EXCEPTION: " + ex.Message)
       End Try
    End If
    sb.Append(vbCrLf)
    sb.Append("*********************************" & _ 
              "***************************************" + vbCrLf)
Next
txtInflatedContents.Text = sb.ToString()

Detailed code use

  1. Use the static method "GetAllObjectBlobs" and pass in the bytes of the PDF file.
  2. VB
    PDFParser.Objects.GetAllObjectBlobs()
  3. The method will return an array of ObjectWrappers. This will give you all of the bytes in the stream as well as the header.
  4. You can then determine what you want to do with the stream. I implemented a simple decode method. I say simple because this does not reflect Adobe's specifications, since the encoded methods could be nested or flate-decoded several times.
  5. Once you determine if the stream needs to be decoded, make a call to "FlateDecodeToASCII".
  6. VB
    PDFParser.Inflator.FlateDecodeToASCII(New System.IO.MemoryStream(wrapper.bytes)) 
  7. That's it. Very simple functions to give you the ability to break out object streams and inflate them using FlateDecode.

Points of interest

  • The code does not look for encryption.
  • Only inflates to a stream or ASCII.
  • I have noticed while testing a file that is not compressed, it has sections marked as FlateDecode, but it gives an invalid header exception. I don't know why that is.
  • This has only been tested on PDFs created by Adobe LiveCycle Designer ES 8.2.
  • Code examples are in VB.NET and all libraries are in C#.

History

  • 06-17-09 - First release.

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)


Written By
Software Developer
United States United States
Graduate of University of Louisiana at Lafayette in computer science.

Comments and Discussions

 
QuestionDidn't work out of the box for me Pin
rdunnill23-May-19 8:54
rdunnill23-May-19 8:54 
QuestionArgumentException Pin
Alessandro Bernardi30-Apr-12 5:09
Alessandro Bernardi30-Apr-12 5:09 
AnswerRe: ArgumentException Pin
jeflefou8-May-12 4:05
jeflefou8-May-12 4:05 
GeneralHeader checksum illegal Pin
ZrytyBeret20-Apr-10 4:50
ZrytyBeret20-Apr-10 4:50 
GeneralRe: Header checksum illegal Pin
Corey Fournier21-Apr-10 9:56
Corey Fournier21-Apr-10 9:56 
GeneralLooking for the right component Pin
Marco Tenuti22-Dec-09 13:33
Marco Tenuti22-Dec-09 13:33 
GeneralRe: Looking for the right component Pin
Corey Fournier7-Jan-10 5:55
Corey Fournier7-Jan-10 5:55 
GeneralRe: Looking for the right component Pin
Marco Tenuti8-Jan-10 11:02
Marco Tenuti8-Jan-10 11:02 
GeneralRe: Looking for the right component Pin
Corey Fournier12-Jan-10 2:35
Corey Fournier12-Jan-10 2:35 
GeneralRe: Looking for the right component Pin
Marco Tenuti12-Jan-10 2:55
Marco Tenuti12-Jan-10 2:55 
GeneralMy vote of 1 Pin
Jamindar12328-Oct-09 5:10
Jamindar12328-Oct-09 5:10 
GeneralRe: My vote of 1 Pin
Corey Fournier30-Oct-09 10:03
Corey Fournier30-Oct-09 10:03 
GeneralNice Pin
CoderOnline28-Oct-09 5:08
CoderOnline28-Oct-09 5:08 
But Not Enough Explanation..!
GeneralRe: Nice Pin
Corey Fournier30-Oct-09 10:05
Corey Fournier30-Oct-09 10:05 
Generalextract text Pin
Huisheng Chen23-Jul-09 15:37
Huisheng Chen23-Jul-09 15:37 
GeneralRe: extract text Pin
Corey Fournier29-Jul-09 4:56
Corey Fournier29-Jul-09 4:56 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.