Click here to Skip to main content
15,881,852 members
Articles / Programming Languages / Visual Basic
Article

100% .NET component for rendering PDF documents

15 Mar 2005CPOL7 min read 401.6K   226  
Describes how to use the PDFRasterizer.NET control to convert PDF to raster images, to display PDF in your Windows application and to silently print PDF documents.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

PDFRasterizer.NET is a component for rendering PDF documents and is written entirely in C#. It has no dependencies other than the .NET framework and it is packaged as a single assembly which makes deployment truly easy. The component draws to any System.Drawing.Graphics object. Because a Graphics object may represent a raster image, an Enhanced Metafile, a printer or the surface of your Windows form or control, this is the most generic and powerful render target that we could have chosen. It does not introduce any vendor specific image classes or viewing controls that bring their own idiosyncrasy and lock you in.

This article describes how to use the PDFRasterizer.NET component to:

  1. convert PDF documents to raster images such as BMP, GIF, PNG, etc.
  2. display PDF documents in your WinForms application (with and without an EMF)
  3. programmatically print PDF documents

An Overview the PDFRasterizer.NET Classes

The PDFRasterizer.NET object model is very simple and consists of just 4 classes: Document, Pages, Page and ExtractedImageInfo. ExtractedImageInfo is used to extract images from the PDF document. See method Page.ExtractImages() in the class reference for details.

Document

This is the top-level class that represents the PDF document that you wish to render. You construct it like this:

C#
using ( FileStream file = new FileStream( "some.pdf", FileMode.Open, 
                                                      FileAccess.Read ) )
{
   document = new Document( new BinaryReader( file ) );
   ... now you can enumerate the pages and draw them
}

An overload is available to provide a password. Document has read-only properties to retrieve document information such as the Author, Subject, etc. The property Pages returns the collection of pages inside this document.

Pages

This is the collection of pages inside the PDF document as returned by Document.Pages. It has a property to retrieve the number of pages and to retrieve a page by index.

Page

Not surprisingly, this class represents a single PDF page and is the most interesting class. The following code demonstrates how to enumerate the pages:

C#
Document document;
... see previous code for how to obtain a document object

for ( int pageIndex = 0; pageIndex < document.Pages.Count; pageIndex++ ) 
{ 
   Page page = document.Pages[ pageIndex ]; 
   ... see further for how to draw the page
}

Once you have a Page object, you can tell it to draw to a System.Drawing.Graphics object:

C#
Page page;
// ... see previous code for how to obtain a page object

System.Drawing.Graphics graphics;
// ... see further for different ways to obtain or create a 
// System.Drawing.Graphics object

page.Draw( graphics );

That looks pretty simple doesn't it? The key part is how you create the Graphics object. This depends on the type of render target that you choose. In the next sections I will describe different ways to instantiate a Graphics object.

Convert PDF to a Raster Image (BMP, PNG, TIFF, etc.)

See the ConvertToImage_cs and ConvertToImage_vb samples that are included with the PDFRasterizer.NET application for full source code.

Image 1
PDF-to-image converter sample (included with the PDFRasterizer.NET installation)

The conversion of a PDF page to a raster image involves the following steps:

  1. Create a Bitmap object at the correct resolution
  2. Wrap a Graphics object around the bitmap
  3. Apply a scale transform to the Graphics object to account for the resolution
  4. Draw the Page to the Graphics object
  5. Save the Bitmap using any GDI+ supported image format
C#
using ( FileStream file = new FileStream( "test.pdf", FileMode.Open, 
                                                      FileAccess.Read ) )
{
   // open the PDF document
   Document document = new Document( new BinaryReader( file ) ); 
  
   // get the first page
   Page page = document.Pages[0];   

   // you must apply a scale to the PDF graphics proportional to the 
   // resolution
   float dpi = 300;
   float scale =  dpi / 72f;
   
   using ( Bitmap bitmap = new Bitmap( (int) ( scale * page.Width ), 
                                       (int) ( scale * page.Height ) ) )
   {
      // wrap a graphics around the bitmap
      Graphics graphics = Graphics.FromImage( bitmap );
  
      // apply the scale that accounts for the resolution
      graphics.ScaleTransform( scale, scale );
      
      // do the actual rendering
      page.Draw( graphics ); 
  
      // save the result as a bmp - could be any format
      bitmap.Save( "test.bmp", ImageFormat.Bmp );
   }
}

The scale transformation deserves some explanation. Suppose you want to render a PDF page that has page size Letter. The size of this format is 8.5 x 11 inch (width x height) which corresponds to 612 x 792 points because an inch has 72 points. If you want to draw this page at a resolution of 72 dots per inch (DPI), then the bitmap would have 612 (8.5 x 72) columns and 792 (11 x 72) rows of pixels. Now suppose that you want to double the resolution to 144 dpi. Now the bitmap should have 8.5 X 144 = 1224 columns and 11 x 144 = 1584 rows of pixels. Hence the scale that is applied to the width and height arguments of the Bitmap constructor.

When the Page.Draw is executed, PDFRasterizer.NET will execute a sequence of method calls to the Graphics object. It does this in its own coordinate system in which a unit corresponds to a point (1/72 inch). In order to make sure that coordinate ( 612, 792 ) in the PDF coordinate space ends up at ( 1224, 1594 ) in the 144 dpi bitmap coordinate space, a scale transform must be applied to the graphics object before the PDF page draws to the Graphics object. Hence the graphics.ScaleTransform() call before drawing the page.

Draw a PDF Page on your Windows control

Because you can draw a PDF page to any Graphics object you can also draw it on the surface of your Windows control. The most obvious place to do this is in the Control.Paint event handler. Typical code to draw a PDF page to the surface of your control looks like this:

C#
Page page;
Panel viewerPanel;

...

// this method is a member of a System.Windows.Forms.Form derived class
// this method handles the Paint event of a control member called viewerPanel
// page is a member of type Page that has been initialized
private void viewerPanel_Paint(object sender, 
                               System.Windows.Forms.PaintEventArgs e)
{
   e.Graphics.Clear( Color.White );
   if ( null != page )
   {
      page.Draw( e.Graphics );
      e.Graphics.DrawRectangle( new Pen( Color.Gray ), 0, 0, 
                                (float) page.Width, (float) page.Height );
   }
}

Note that normally you would call e.Graphics.TranslateTransfom() and e.Graphics.ScaleTransform() before calling page.Draw() in order to position and size the PDF page.

The drawback of this simple approach is that the PDF page is drawn each time the Paint event is fired. For complex PDF pages this can be very time consuming. A better way is to render the PDF page to an Enhanced Metafile once and then play that EMF file repeatedly in your Paint event handler. This is discussed next.

Use EMF to Record and Play a PDF Page

Image 2
PDF viewer sample (incuded with the PDFRasterizer.NET installation)

See the ViewPDF_cs and ViewPDF_vb samples that are included with the PDFRasterizer.NET application for full source code.

An enhanced metafile (EMF) can be thought of as a 'recording' of method calls to a Graphics object. Once recorded you can play the method calls any number of times to a Graphics object. As opposed to a raster format such as BMP or PNG, an EMF preserves the vector graphics. This makes it an excellent format for the purpose of repeatedly drawing a PDF page at different zoom levels. You simply apply a ScaleTransfrom according to the zoom level and then play the EMF.

Create an EMF file

The following method creates a System.Drawing.Imaging.Metafile from a Page object.

C#
private Metafile createMetafile( Page page ) 
{ 
   Metafile metafile = null; 
   
   // create a Metafile object that is compatible with the surface of this 
   // form
   using ( Graphics graphics = this.CreateGraphics() )
   { 
      System.IntPtr hdc = graphics.GetHdc(); 
      metafile = new Metafile( hdc, new Rectangle( 0, 0, 
           (int) page.Width, (int) page.Height ), MetafileFrameUnit.Point ); 
      graphics.ReleaseHdc( hdc );
   }

   // draw to the metafile
   using ( Graphics metafileGraphics = Graphics.FromImage( metafile ) )
   {
      metafileGraphics.SmoothingMode = SmoothingMode.AntiAlias; // smooth the 
                                                                // output
      page.Draw( metafileGraphics );
   }

   return metafile;
}

Play an EMF file

The createMetafile method is called from the Paint event handler when it requires a metafile to draw the corresponding page. The metafile is created the first time it is needed (lazy) and cached. The following code shows the Pain event handler. The metafiles variable is a member of type Metafile[] that has been initialized to an array of null references when the PDF document was opened.

C#
Document document; // initialized when user selects PDF document
Metafile[] metafiles[]; // initialized when user selects PDF document

...

// this method is a member of a System.Windows.Forms.Form derived class
// this method handles the Paint event of a control member called viewerPanel
private void viewerPanel_Paint(object sender, 
                               System.Windows.Forms.PaintEventArgs e)
{
   if ( null == document ) return;
         
   e.Graphics.Clear( Color.White );

   // create a metafile for the selected page if not already
   if ( null == metafiles[ selectedPage ] )
   {
      metafiles[ selectedPage ] = createMetafile( 
                                           document.Pages[ selectedPage ] );
   }
       
   // scale to fit the page  
   Page page = document.Pages[ selectedPage ];
   float scale = (float) Math.Min( viewerPanel.Width / page.Width, 
                                   viewerPanel.Height / page.Height );
   e.Graphics.ScaleTransform( scale, scale );
   
   // play the metafile to the surface of the control
   e.Graphics.EnumerateMetafile( metafiles[ selectedPage ], 
                  new Point( 0, 0 ), 
                  new Graphics.EnumerateMetafileProc( MetafileCallback ) );
}

The MatefileCallback function is bolier-plate code and not relevant to the working of PDFRasterizer.NET. See the sample projects ViewPDF_EMF_cs and ViewPDF_EMF_vb for its implementation.

Print a PDF Page

See the PrintPDF_cs and PrintPDF_vb samples that are included with the PDFRasterizer.NET application for full source code.

Image 3
Print PDF sample (included with the PDFRasterizer.NET installation)

Printing involves the following steps:

  1. create a PrintDocument instance
  2. set the PrinterSetting and PageSettings properties of the PrintDocument (e.g. by showing the PageSetupDialog)
  3. assign an event handler to the PrintDocument.PrintPage event
  4. call the PrintDocument.Print method to start printing
  5. handle the PrintDocument.PrintPage event by drawing to the provided Graphics object

Because in step 5. the application is provided with a Graphics object to actually draw to the printer, we can pass this Graphics object to the Page.Draw method. Note that this will preserve all vector graphics on the PDF page. Scan conversion is done by the printer.

The following method is called when the user selects 'Print' from the viewer application. The variable pagesList is a ListBox that displays all the pages. The variable selectedPages is an IEnumerator that will be used to print each selected page from top to bottom.

C#
ListBox pagesList;         // holds the pages in the select PDF document
IEnumerator selectedPages; // used to enumerate the selected pages in the 
                           // listbox

...

// this method is a member of a System.Windows.Forms.Form derived class
void print()
{
   if ( null == document ) return;

   // get the selected pages
   selectedPages = pagesList.SelectedIndices.GetEnumerator();
   selectedPages.Reset();

   // move enumerator to the first selected page
   if ( selectedPages.MoveNext() )
   {
      // create a print document with the same name as the PDF document
      PrintDocument printDocument = new PrintDocument();
      printDocument.DocumentName = document.Title;
         
      // query the user for printer and page settings
      PageSetupDialog setupDialog = new PageSetupDialog();
      setupDialog.Document = printDocument;
      if ( DialogResult.OK == setupDialog.ShowDialog() )
      {
         printDocument.DefaultPageSettings = setupDialog.PageSettings;
         printDocument.PrinterSettings = setupDialog.PrinterSettings;

         // setup event handler and start printing
         printDocument.PrintPage += new PrintPageEventHandler( printPage );
         printDocument.Print();
      }
   }
}

After the printDocument.Print() method is called, the PrintPage event handler will be called for each page to print. This event handler is implemented as follows:

C#
// this is the PrintDocument.PrintPage event handler
void printPage( object sender, PrintPageEventArgs e )
{
   e.Graphics.PageUnit = GraphicsUnit.Point;
   
   // get the current index from the enumerator     
   if ( null != selectedPages.Current )
   {
      int pageIndex = (int) selectedPages.Current;
      Page page = document.Pages[ pageIndex++ ];
      
      // draw the page to the printer
      page.Draw( e.Graphics );
   }

   // more pages?
   e.HasMorePages = selectedPages.MoveNext();
}

The selectedPages of type IEnumerator returns the next index. This index is used to select the corresponding Page from the Pages collection. Next, this page draws to the provided Graphics object that represents the printer. Finally the event handlers reports back if there are more pages. This is done by trying to move to the next selected page and returning the result.

Please Send us PDFs that do not Render Correctly

PDF is a very rich format with many different ways to accomplish the same output. In addition there are many producers that each have their own interpretation of the specification. Consequently, you may encounter documents that we do not render correctly. Please send them to support@tallcomponents.com.

TallComponents

You can visit us at http://www.tallcomponents.com?ref=34. Here you can download evaluations for all our PDF-related .NET components and get support.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Netherlands Netherlands
Worked for some years as a software engineer, architect and project leader for different software companies. Works at TallComponents, vendor of class libraries for creating, manipulating and rendering PDF documents.

Comments and Discussions