Click here to Skip to main content
15,888,351 members
Articles / Web Development / ASP.NET
Tip/Trick

A SOA Approach to Dynamic DOCX-PDF Report Generation - Part 2

Rate me:
Please Sign up or sign in to vote.
4.75/5 (4 votes)
16 Jan 2012CPOL3 min read 51.9K   18   9
Generating automatized PDF reports based on Docx templates and Business-Logic XML-serialized data

Introduction

Having already achieved automatized MsOffice-independent Docx report generation in a client-server architecture following the approach explained in my previous article "A SOA approach to dynamic DOCX-PDF report generation - Part 1", now we'll look into automatically printing those docx files into PDF from managed code and transmitting the PDF bytes through HTTP.

The PDF conversion is based on a free BullZip PDF product, which offers a free, full-featured, programmable and very well documented PDF printer that can print any file to PDF, including Docx files.

Needless to say that PDF is probably the most used document exchange format between different platforms, therefore the need to have PDF reports of some kind of data is common to most data-centric applications.

1. Installing the PDF Printer

The first thing to do is to download and install BullZipPdf. It will create a PDF printer in the system and it will include the help file in the installation directory. Read through the help file to learn how to use the Bullzip.PdfWriter namespace.

2. Adding the PDF Conversion to an Existing Visual Studio Solution

First of all, we need to import the package into the solution. As sweet as it can be, we can find the package in the GAC, so just go on Add Reference -> .NET and find BullZip PDF Writer. This will add the Bullzip.PDFWriter assembly to the solution, which exposes its classes and methods under the Bullzip.PDFWriter namespace. The next thing to do is configuring the PDF printer. This can be achieved through a .ini file, but I'm not going to enter into this, you can read a lot about it in the Bullzip documentation. The printer settings are managed by a class called PdfSettings, whilst the PDF creation methods are in a class called PdfUtils. Everything is ready now, we can already start converting to PDF!

3. Converting to PDF

Here's what the test application does:

  1. It includes some docx templates with sample data in a templates directory
  2. Generates customized docx reports based on the docx templates and some XML-serialized Business-Logic data whose structure corresponds to the custom XML parts in the docx templates
  3. Saves the docx reports into a temporary directory
  4. Prints the docx reports into PDF
  5. Sends the PDF bytes through HTTP
  6. Destroys the docx and PDF files

This PrintToPdf method loads the printer settings from an ".ini" file, it "reads" a docx file from a temporary directory, creates the PDF file and then destroys the original docx and PDF.

C#
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;
using System.ComponentModel;
using System.Configuration;
using System.ServiceModel;
using Bullzip.PdfWriter;

namespace DocxGenerator.SL.WCF
{
    public class PdfMaker
    {        
        internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
        {
            try
            {
                string tempFolder = appFolder + @"\temp";
                string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;
                
                PdfSettings pdfSettings = new PdfSettings();
                pdfSettings.PrinterName = ConfigurationManager.AppSettings["PdfPrinter"];

                string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
                pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
                pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
                pdfSettings.WriteSettings(settingsFile);

                PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
                string tempPdfFilePath = 
                         tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
                
                bool fileCreated = false;
                while (!fileCreated) 
                {
                    fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
                }

                byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);

                File.Delete(tempDocxFilePath);
                File.Delete(tempPdfFilePath);

                return pdfBytes;
            }
            catch (Exception ex)
            {
                throw new FaultException("WCF ERROR!\r\n" + ex.Message);
            }
        }
    }

Points of Interest

The scope of this article is limited to a mere illustration of what can be achieved through this architecture. With a little bit of head-scratching, you can extend this and make it into a PDF conversion server (did anyone think of a free version Adobe Distiller ???), a scheduled batch printer, an archiving system, etc.

If integrated in the SOA report generation solution mentioned above, this permits you to get rid of the docx files and use PDF as the document exchange format.

Have fun!

Software Environment Required on the Server

This is the required software environment for the implementation of this solution on a server:

  1. Microsoft Word Viewer

    The Word document will be opened and closed immediately, just in time to be sent to the PDF printer queue.

  2. Bullzip PDF Printer

    This is the PDF printer which transforms the .docx documents to .pdf files.

If implemented as an enterprise solution, although it's not ideal, it can be made stable by writing safe code in order to not let MsWord or the print queue hang. Out of personal experience, it DOES WORK and it IS STABLE.

History

The previous (must-read to understand the SOA integration concepts) article that brought to this: "A SOA approach to dynamic DOCX-PDF report generation - Part 1".

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
United Kingdom United Kingdom
I've been involved in object-oriented software development since 2006, when I graduated in Information and TLC Engineering. I've been working for several software companies / departments, mainly on Microsoft and Sun/Oracle technologies. My favourite programming language is C#, next comes Java.
I love design patterns and when I need to resolve a problem, I try to get the best solution, which is often not the quickest one.

"On the best teams, different individuals provide occasional leadership, taking charge in areas where they have particular strengths. No one is the permanent leader, because that person would then cease to be a peer and the team interaction would begin to break down. The structure of a team is a network, not a hierarchy ..."
My favourite team work quotation by DeMarco - Lister in Peopleware

Comments and Discussions

 
Questionthank you Pin
kocokolo18-Jun-12 19:28
kocokolo18-Jun-12 19:28 
AnswerRe: thank you Pin
Erion Pici24-Oct-12 3:28
Erion Pici24-Oct-12 3:28 
GeneralReason for my vote of 5 Hard to find this content anywhere. ... Pin
G Ryno19-Dec-11 6:47
G Ryno19-Dec-11 6:47 
GeneralRe: Reason for my vote of 5Hard to find this content anywhere. ... Pin
Erion Pici24-Oct-12 3:28
Erion Pici24-Oct-12 3:28 
General[My vote of 1] This produces text only pdf generation (no formatting/style) or executes winword on the server = fail Pin
goggles813-Jan-12 12:15
goggles813-Jan-12 12:15 
GeneralRe: [My vote of 1] This produces text only pdf generation (no formatting/style) or executes winword on the server = fail Pin
Erion Pici14-Jan-12 7:34
Erion Pici14-Jan-12 7:34 
GeneralRe: [My vote of 1] This produces text only pdf generation (no formatting/style) or executes winword on the server = fail Pin
goggles816-Jan-12 13:22
goggles816-Jan-12 13:22 
GeneralRe: [My vote of 1] This produces text only pdf generation (no formatting/style) or executes winword on the server = fail Pin
Erion Pici16-Jan-12 21:04
Erion Pici16-Jan-12 21:04 
GeneralRe: [My vote of 1] This produces text only pdf generation (no formatting/style) or executes winword on the server = fail Pin
goggles817-Jan-12 10:33
goggles817-Jan-12 10:33 
Erion Pici wrote:
In the company where I work we use the same principle to print Word documents to PDF and, although I agree in part with your objection that this isn't an ideal enterprise solution, I still believe, out of personal experience, that this is a stable solution if implemented carefully. By "carefully" I mean writing safe code which doesn't permit any process to hang.

Unfortunately this has not been my personal experience with this type of thing in the past (I have had many problems with it). I feel that I am not alone. Look at the BullZip forums and the complaints/problems there with this type of thing (a Google search also yields a lot of people having problems with this).

Here is some major problems with implementing something like this.

1. It is not known what kind of scalability/performance the implementer needs.
2. What OS is being used (some handle things like this much better than other in my experience.
3. What kind of hardware is this running on, What else is running on this server, what kind of network latency can be expected, ETC (you get the point - lots of variables that may affect stability of this type of scenario)?

This setup may work fine for you at your company but do you have to generate 40k+ PDF's per hour? In my personal experience, automating applications to do things like this on a server works OK in a controlled environment, but in production it is not resilient. If a high load demand or low resource condition occurs, other software is installed, system updates are done, ETC there may be problems.

Using application automation along with potentially blasting the print spooler subsystem (not exactly known for stability) with a high quantity of jobs is a double threat for problems. I don't see this as worth the potential problems when you can purchase an actual .NET managed DOCX->PDF rendering convertion library for between $500 and $1000 dollars (if it saves 2-3 days of troubleshooting in it's lifetime it has paid for itself).

Thank you for adding the extra information (although i don't consider it enough of a warning). The only other suggestion that I may add is that you supply some code or guideline for "writing safe code which doesn't permit any process to hang".

modified 17-Jan-12 20:55pm.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.