Click here to Skip to main content
15,867,568 members
Articles / Desktop Programming / Win32
Tip/Trick

Embed HTML in a Word Document using Aspose.Words

Rate me:
Please Sign up or sign in to vote.
2.71/5 (5 votes)
23 Jul 2014CPOL2 min read 34.1K   2   2   2
Embed HTML in a Word Document

This article appears in the Third Party Products and Tools section. Articles in this section are for the members only and must not be used to promote or advertise products in any way, shape or form. Please report any spam or advertising.

Introduction

Aspose.Words for .NET is a class library that enables your applications to perform a great range of document processing tasks. Aspose.Words supports DOC, DOCX, RTF, HTML, OpenDocument, PDF, XPS, EPUB and other formats. With Aspose.Words you can generate, modify, convert, render and print documents without utilizing Microsoft Word. (1) It is a very powerful tool which isn't dependent on word being installed.

Background

We chose to not use the DOM approach with Aspose words for many reasons. We had a need to take html which was outputted from TextControl (TxText) and insert it in a Table cell or in the body of the document. I had posted on Aspose's forum and was told it wasn't possible. I came up with this solution, which is the undocumented way to import html using Aspose.Words OO Approach.

Original Forum Post which spawned this article.

http://www.aspose.com/community/forums/thread/551630/how-to-insert-html-contend-in-a-run-or-table.cell.aspx

Using the code

The great thing about aspose is the ability to move nodes from one document to another. You basically import the node into the a new document and then append it to a existing node. The trick is to append the import node to the correct parent node of the existing document.

Map of what node can be a child/parent of another node (2):

Aspose.Words Node Hierarchy

Method which inserts html inside a aspose body node. The Aspose document doesn't allow you to directly import html inside a document after the document was created. The only way to open the document is to use load options in the constructor. We needed to insert the html in different locations throughout the document after it was created. To do this you have to created a brand new document and load it using the import options. Once its been created you can exatract all child nodes from the document and import them into the existing document where you choose. We ran several performance tests and found that performance wasn't affected by creating a additional document just for import html.

C#
private void addHTMLIntoNode(string html, Aspose.Words.Body pNode)
{
    List<Node> list = new List<Node>();
 
    // Open the stream.
    using (System.IO.MemoryStream mStream = new System.IO.MemoryStream(System.Text.ASCIIEncoding.ASCII.GetBytes(html)))
    {
 
        // Open the document. Note the Document constructor detects HTML format automatically.
        // Pass the URI of the base folder so any images with relative URIs in the HTML document can be found.
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.LoadFormat = LoadFormat.Html;
 
        Document doc = new Document(mStream, loadOptions);
 
        NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
 
        NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
 
        var parents = from t in paragraphs.ToArray() where t.ParentNode is Aspose.Words.Tables.Cell select t;
 
        var paraclean = paragraphs.ToArray().Except(parents);
 
        list.AddRange(paraclean.ToArray());
        list.AddRange(tables.ToArray());
 
        foreach (Node n in list)
        {
            dynamic newNode = _doc.ImportNode(n, isImportChildren: true, importFormatMode: ImportFormatMode.KeepSourceFormatting);
 
            pNode.AppendChild(newNode);
        }
    }
 
}

Output Report:

Helpful Extension Methods:

Using Apose.Words object model approch was very difficult and felt like it was all over the place. I came up with extension methods to essentially wrap their code and allow for cleaner more readable programming. See full solution for extension methods.

Adding table, row and cells with text without the extension methods is not very readable or maintainable.

var table = new Aspose.Words.Tables.Table(_doc);
fSection.Body.AppendChild(table);

var row = new Aspose.Words.Tables.Row(_doc);
table.Rows.Add  (row);

var cellOne = new Aspose.Words.Tables.Cell(_doc);
row.Cells.Add(cellOne);

var paraOne = new Paragraph(_doc);
fSection.Body.AppendChild(paraOne);

Run runOne = new Run(_doc, "Cell One");
runOne.Font.Name = "Microsoft Sans Serif";
runOne.Font.Size = 9;
runOne.Font.Bold = true;

paraOne.AppendChild(runOne);

var cellTwo = new Aspose.Words.Tables.Cell(_doc);
row.Cells.Add(cellTwo);

var paraTwo = new Paragraph(_doc);
fSection.Body.AppendChild(paraTwo);

Run runTwo = new Run(_doc, "Cell Two");
runTwo.Font.Name = "Microsoft Sans Serif";
runTwo.Font.Size = 9;
runTwo.Font.Bold = true;

paraTwo.AppendChild(runTwo);

Adding table, row and cells with text WITH the extension methods allows for centeralized, readable and highly maintainable code.

var row = fSection.NewTable().NewRow();
row.NewCell().NewText("Cell 1");
row.NewCell().NewText("Cell 2");

References

(1) http://www.aspose.com/docs/display/wordsnet/Introducing+Aspose.Words+for+.NET

(2)Image taken from Aspose.Words documentation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Suggestion[My vote of 2] Just a bunch of code Pin
Kunal Chowdhury «IN»22-Jul-14 21:06
professionalKunal Chowdhury «IN»22-Jul-14 21:06 
QuestionNot an article Pin
OriginalGriff22-Jul-14 8:14
mveOriginalGriff22-Jul-14 8:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.