Click here to Skip to main content
15,886,199 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I need to find and replace a placeholder string in a PDF file. The PDF file is loaded with the iText library and I have been trying to follow code samples to follow some code samples I have dug up, more often than not for the original Java implementation.

The problem is that the samples don't work for my PDF file. I get a PdfDictionary with PdfObjects, but when I try to filter out the objects with texts I get no results. I know that there is a text in there, because I first took a look at the contents of the file with a PDF parser. The parser will not allow me to make changes and write them back, but at least I know that there is something in there that can be found.

Taking a closer look at the PdfDictionary object, I found only one flavor of PdfObject in it: PdfIndirect reference. The name suggests that I must resolve these references to get objects which I can examine and modify, but i can't find any sample code for that.

What I have tried:

I have to work with an improvised setup with several computers and remote desktops at the moment, so I can't just post my experimental code right now. This is what I have:

1) Open a PdfReader (works)
2) Get a PdfDocument object with the reader (works)
3) Iterate through the pages of the document and get a Pdfpage object (works)
4) (For each page) get a PdfDictionary from the page object (works)
5) Get Pdf objects from the dictionary with dictionary.Get(PdfName.Contents) (works)
6) Normally i would just have to iterate over the results from step 5), but I only get PdfIndirectReference objects. How can I resolve and edit these references?

MemoryStream stream;
PdfReader reader;
PdfDocument document;
Dictionary<String, PdfFormField> fields;
PdfPage page;
PdfDictionary dict;
PdfStream content;
int pages;
int i;

using (stream = new MemoryStream(BinaryFile))
{
    using (reader = new PdfReader(stream))
    {
        using (document = new PdfDocument(reader))
        {
            pages = document.GetNumberOfPages();
            for (i = 1; i <= pages; i++)
            {
                page = document.GetPage(i);
                dict = page.GetPdfObject();
                var xcontent = dict.Get(PdfName.Contents);
                if (xcontent != null)
                {
                    PdfArray thearray= xcontent as PdfArray;
                    foreach (PdfObject obj in thearray)
                    {
                        // these objects actually are PdfIndirectReferences
                        // converting them leads nowhere, so here is the point
                        // where I would have to resolve the reference and use whatever
                        // objects I might obtain that way.
                        PdfStream strm = obj as PdfStream;
                        if(strm != null)
                        {
                            byte[] data = strm.GetBytes();
                            UTF8Encoding enc = new UTF8Encoding();

                            string test = enc.GetString(data);
                        }
                    }
                }
            }
        }
    }
}
Posted
Updated 23-Mar-20 1:57am
v2
Comments
ZurdoDev 23-Mar-20 7:35am    
It would help if you clicked Improve question and showed just some relevant code.
CodeWraith 23-Mar-20 7:58am    
As you wish, but I doubt it will help very much. Everything so far is ok, but what can i do with the indirect references from there on?
ZurdoDev 23-Mar-20 8:03am    
It always helps to make sure we understand what you are saying.

This might help, https://stackoverflow.com/questions/37014984/how-to-read-text-of-appearance-stream
CodeWraith 23-Mar-20 8:37am    
Thanks. I already took a first look and it looks like I'm going in circles. The problem always is that I have to ask for what objects I want to see from the document and 'Contents' only yields the indirect references. I would be very happy to get to the point directly, but have no idea where the text is actually stored in the document or how to ask for this.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900