Click here to Skip to main content
15,867,686 members
Articles / Programming Languages / C#

Splitting and Merging PDF Files in C# Using iTextSharp

Rate me:
Please Sign up or sign in to vote.
4.88/5 (15 votes)
10 Dec 2013CPOL5 min read 217.9K   41   18
Splitting and merging PDF files in C# using the iTextSharp library.

I recently posted about using PdfBox.net to manipulate Pdf documents in your C# application. This time, I take a quick look at iTextSharp, another library for working with Pdf documents from within the .NET framework. 

Some Navigation Aids:

What is iTextSharp?

iTextSharp is a direct .NET port of the open source iText Java library for PDF generation and manipulation. As the project’s summary page on SourceForge states, iText “  . . . can be used to create PDF Documents from scratch, to convert XML to PDF . . . to fill out interactive PDF forms, to stamp new content on existing PDF documents, to split and merge existing PDF documents, and much more.”

iTextSharp presents a formidable set of tools for developers who need to create and/or manipulate Pdf files. This does come with a cost, however. The Pdf file format itself is complex; therefore, programming libraries which seek to provide a flexible interface for working with Pdf files become complex by default. iText is no exception.

I noted in my previous post on PdfBox that PdfBox was a little easier for me to get up and running with, at least for rather basic tasks such as splitting and merging existing Pdf files. I also noted that iText looked to be a little more complex, and I was correct. However, iTextSharp does not suffer some of the performance drawbacks inherent to PdfBox, at least on the .net platform.

Superior Performance vs. PdfBox

Aston-Martin-V8-Sports-Car-For-EveryAs I observed in my previous post, PdfBox.net is NOT a direct port of the PdfBox Java library, but instead is a Java library running within .net using IKVM. While I found it very cool to be able to run Java code in a .NET context, there was a serious performance hit, most notably the first time the PdfBox library was called, and the massive IKVM library spun up what amounts to a .Net implementation of the Java Virtual Machine, within which the Java code of the PdfBox library is then executed.

Needless to say, iTextSharp does not suffer this limitation. the library itself it relatively lightweight, and fast.

Extracting and Merging Pages from an Existing Pdf File

One of the most common tasks we need to do is extract pages from one Pdf into a new file. We’ll take a look at some relatively basic sample code which does just that, and get a feel for using the iTextSharp programming model.

In the following code sample, the primary iTextSharp classes we will be using are the PdfReader, Document, PdfCopy, and PdfImportedPage classes. 

My simplified understanding of how this works is as follows: The PdfReader instance contains the content of the source PDF file. The Document class, once initialized with the PdfReader instance and a new output FileStream, essentially becomes a container into which pages extracted from the source file represented in the PdfReader class will be copied. Note that the Document class represents the Pdf content as HTML, which will be used to construct a properly formatted Pdf file. The result is then output to the Filestream, and saved to disk at the location specified by the destination file name.

You can download the iTextSharp source code and binaries as a single package from Files page at the iTextSharp project site. Just click on the “Download itextsharp-all-5.4.0.zip” link. Extract the files from the .zip archive, and stash them somewhere convenient. Next, set a reference in your project to the itextsharp.dll. You will need to browse to the folder where you stashed the extracted contents of the iTextSharp download.

NOTE: The complete example code for this post is available at my Github Repo.

I went ahead and created a project named iTextTools, with a class file named PdfExtractorUtility. Add the following using statements at the top of the file:

Set up references and Using Statements to use iTextSharp

C#
using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
// CLASS DEPENDS ON iTextSharp: http://sourceforge.net/projects/itextsharp/

namespace iTextTools
{
    public class PdfExtractorUtility
    {

    }
}

 

First, I’ll add a simple method to extract a single page from an existing PDF file and save to a new file:

Extract Single Page from Existing PDF to a new File:

C#
public void ExtractPage(string sourcePdfPath, string outputPdfPath, 
    int pageNumber, string password = "<span style="color: rgb(139, 0, 0);">")
{
    PdfReader reader = null;
    Document document = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // Capture the correct size and orientation for the page:
        document = new Document(reader.GetPageSizeWithRotation(pageNumber));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(document, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

        document.Open();
 
        // Extract the desired page number:
        importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
        pdfCopyProvider.AddPage(importedPage);
        document.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

As you can see, simply pass in the path to the source document, the page number to be extracted, and an output file path, and you’re done.

If we want to be able to a range of contiguous pages, we might add another method defining a start and end point:

Extract a Range of Pages from Existing PDF to a new File:

C#
public void ExtractPages(string sourcePdfPath, string outputPdfPath, 
    int startPage, int endPage)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
 
            sourceDocument.Open();
 
        // Walk the specified range and add the page copies to the output file:
        for (int i = startPage; i <= endPage; i++)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, i);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

What if we want non-contiguous pages from the source document? Well, we might override the above method with one which accepts an array of ints representing the desired pages:

Extract multiple non-contiguous pages from Existing PDF to a new File:

C#
public void ExtractPages(string sourcePdfPath, 
    string outputPdfPath, int[] extractThesePages)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the 
        // contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(extractThesePages[0]));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument,
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

        sourceDocument.Open();
 
        // Walk the array and add the page copies to the output file:
        foreach (int pageNumber in extractThesePages)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

Scratching the Surface

Obviously, the example(s) above are a simplistic first exploration of what appears to be a powerful library. What I notice about iText in general is that, unlike some API’s, the path to achieving your desired result is often not intuitive. I believe this is as much to do with the nature of the PDF file format, and possibly the structure of lower-level libraries upon which iTextSharp is built.

That said, there is without a doubt much to be discerned by exploring the iTextSharp source code. Additionally, there are a number of resources to assist the erstwhile developer in using this library:

Additional Resources for iTextSharp

Lastly, there is a book authored by one of the primary contributors to the iText project, Bruno Lowagie:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer XIV Solutions
United States United States
My name is John Atten, and my username on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, Javascript/Node.js, Various flavors of databases, and anything else I find interesting. I am always looking for new information, and value your feedback (especially where I got something wrong!)

Comments and Discussions

 
QuestionSome more iText code that might help someone merge some PDFs... Pin
Michael Breeden20-Jul-20 1:47
Michael Breeden20-Jul-20 1:47 
QuestionThanks! Pin
BjarkeCK21-May-18 4:24
BjarkeCK21-May-18 4:24 
QuestionWhile merging two pdf files inetrlinks gets removed Pin
Nareshkumar2613-Nov-17 20:30
Nareshkumar2613-Nov-17 20:30 
Hello,
I am using pdfwriter to merge two or more pdfs in one file but, I have one issue is that,Internal links gets removed. e.g. I have inter linked files like suppose 1.pdf have 2 pages on 1st page there is link of 2nd page on some text. after using pdfwriter merge code the links are getting removed, they are no longer in use. Can any one please guide me?
QuestionExtract pages with different orientation in VB.NET 2015 Pin
Mokkujin21-Jul-16 4:39
Mokkujin21-Jul-16 4:39 
QuestionSplit All pages and return all in memory Pin
Michael Clinton2-Oct-15 9:34
Michael Clinton2-Oct-15 9:34 
QuestionExtract pages with different orientation. Pin
jwphrijk25-Sep-15 21:30
jwphrijk25-Sep-15 21:30 
QuestionNew Files Created Using iTextSharp are much larger than expected Pin
Member 1058562630-Apr-15 5:11
Member 1058562630-Apr-15 5:11 
QuestionKeep AcroFields? Pin
Bananenbrot24-Jul-14 21:22
Bananenbrot24-Jul-14 21:22 
Suggestionanother guide for splitting,merging pdfs Pin
nevil_11918-Mar-14 18:26
nevil_11918-Mar-14 18:26 
QuestionI used iTextsharp to combine pdf files Pin
Sajitha N Rathnayake1-Mar-14 15:43
Sajitha N Rathnayake1-Mar-14 15:43 
AnswerRe: I used iTextsharp to combine pdf files Pin
John Atten1-Mar-14 15:51
John Atten1-Mar-14 15:51 
GeneralRe: I used iTextsharp to combine pdf files Pin
Sajitha N Rathnayake1-Mar-14 15:57
Sajitha N Rathnayake1-Mar-14 15:57 
QuestionThanx Pin
jimpar11-Dec-13 5:53
jimpar11-Dec-13 5:53 
QuestionImportant Question about Splitting PDFs in iTextSharp/vb Pin
Tarey Wolf29-Dec-13 16:10
Tarey Wolf29-Dec-13 16:10 
AnswerRe: Important Question about Splitting PDFs in iTextSharp/vb Pin
jimpar22-Apr-14 7:52
jimpar22-Apr-14 7:52 
QuestionThanks and a query Pin
umeshfaq10-Oct-13 5:00
umeshfaq10-Oct-13 5:00 
Questiondoubt Pin
vineethnair14-Jun-13 0:19
vineethnair14-Jun-13 0:19 
AnswerRe: doubt Pin
John Atten14-Jun-13 4:03
John Atten14-Jun-13 4:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.