Click here to Skip to main content
15,881,882 members
Articles / Programming Languages / C# 4.0

Microsoft Interop API to convert the .doc, .docx, .dot, .dotx and .xls,.xlsx, .rtf to HTML

Rate me:
Please Sign up or sign in to vote.
5.00/5 (11 votes)
13 Dec 2012CPOL3 min read 87.7K   6.4K   34   7
Convert Word documents, Excel sheets to HTML files using Microsoft Office Interop API and render the result back to a client browser.

Table of Contents 

  • Introduction. 
  • Microsoft office Interop library
  • Adding the reference of Microsoft Interop libraries.
  • Using the code   
  • Access the Converter functionality
  • Summary
  • Disclaimer  

Introduction

This article is about using Microsoft Office Interop APIs to convert Word documents and Excel sheets and document templates to an HTML file and render on a client browser. Sometimes developer find it difficult to convert the excel sheets and document to equivalent html, then office interop api are good solutions comes as very handy.

Microsoft Office Interop library  

Before using Microsoft office interop APIs, you have to install the Microsoft Office on your system. without ms office we can not run Microsoft Office interop APIs. If you have not msoffice install please first install the ms office.

Download Microsoft office 

Adding the reference of Microsoft Interop libraries

If you have installed the ms office then add the references of required Microsoft office interop libraries.

  1. Microsoft Office Excel library. 
  2. Microsoft Office Word library
  3. Microsoft Office object library 
In this article i will show the functionality to covert the word document files and excel files to html file, so we only need to add the reference of these above 3 libraries.

Steps to add library references

  1. Right click on Reference folder in your solution
  2. Click Add reference
  3. Click on COM tab
  4. Select Microsoft Office 8.0 or 14.0 object library, press the control key and select the Microsoft Office Excel library  and Microsoft Office Word library
  5. Click on OK button.

Note: Assembly can be different, it is based on the Office version installed in your machine.

Image 1

Using the code  

Before actually building the code you must have MS office installed in your office, you also need to configure the ckEditor. Because I am using ckEditor to display the HTML Content that is generated from document or excel sheet. Add the following config to you page setting in web.config file.

XML
<controls>
    <add tagPrefix="CKEditor" assembly="CKEditor.NET" namespace="CKEditor.NET"/>
</controls>

DocToHtml class  

Word document to HTML conversion has been implemented in class below is the snipped of the actual code which convert the doc file to HTML string. 

C#
public StringBuilder Convert()
{
    Application objWord = new Application();

    if (File.Exists(FileToSave))
    {
        File.Delete(FileToSave);
    }
    try
    {
        objWord.Documents.Open(FileName: FullFilePath);
        objWord.Visible = false;
        if (objWord.Documents.Count > 0)
        {
            Microsoft.Office.Interop.Word.Document oDoc = objWord.ActiveDocument;
            oDoc.SaveAs(FileName: FileToSave, FileFormat: 10);
            oDoc.Close(SaveChanges: false);
        }
    }
    finally
    {
        objWord.Application.Quit(SaveChanges: false);
    }
    return base.ReadConvertedFile();
}

XlsToHtml class 

Excel sheet to HTML conversion has been implemented in class below is the snipped of the actual code which convert the Excel file to HTML string.

C#
public StringBuilder Convert()
{
    Application excel = new Application();

    if (File.Exists(FileToSave))
    {
        File.Delete(FileToSave);
    }
    try
    {
        excel.Workbooks.Open(Filename: FullFilePath);
        excel.Visible = false;
        if (excel.Workbooks.Count > 0)
        {
            IEnumerator wsEnumerator = excel.ActiveWorkbook.Worksheets.GetEnumerator();
            object format = Microsoft.Office.Interop.Excel.XlFileFormat.xlHtml;
            int i = 1;
            while (wsEnumerator.MoveNext())
            {
                Microsoft.Office.Interop.Excel.Worksheet wsCurrent = (Microsoft.Office.Interop.Excel.Worksheet)wsEnumerator.Current;
                String outputFile = "excelFile" + "." + i.ToString() + ".html";
                wsCurrent.SaveAs(Filename: FileToSave, FileFormat: format);
                ++i;
                break;
            }
            excel.Workbooks.Close();
        }
    }
    finally
    {
        excel.Application.Quit();
    }
    return base.ReadConvertedFile();
}

ConverterLocator class 

To call the actual converter based on the extension of the file, we need some converter locator which can return the actual converter service.

Like if i upload the xls file, the ConverterLocator must return the instance of XlsToHtml class else if the upload files is document then ConverterLocator return the instance of DocToHtml class. 

Both XlsToHtml and DocToHtml class implements the IConverter interface, which declare the Convert method.

C#
public static IConverter Converter(string fullFilePath, string fileToSave)
{
    IConverter converter = null;
    string ext = fullFilePath.Split('.').Last().ToLower();
    switch (ext)
    {
        case "doc": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "docx": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "dot": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "dotx": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "rtf": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "xls": converter = new XlsToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "xlsx": converter = new XlsToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
    }
    return converter;
}

Access the Converter functionality

We are ready with every thing, now we need to call the functionality to covert the document and excel to html and render the result on the browser screen.

Below is the snippt of code to call the IConverter service.

C#
private void ConvertAndLoadDocumentInEditor()
{
    //To save every file with different name
    string randamName = DateTime.Now.ToFileTime().ToString();

    string relativePath = Server.MapPath("~") + "/_Temp/";

    //Complete path of the file.
    string FilePath = relativePath + randamName + flDocument.FileName;

    string GeneratedName = randamName + 
      flDocument.FileName.Split('.')[flDocument.FileName.Split('.').Count() - 2] + ".html";

    flDocument.SaveAs(FilePath);

    //Converter functionality needs the file name to save as.
    string FileToSave = 
      HttpContext.Current.Server.MapPath("~") + "_Temp\\" + GeneratedName;

    //Get the instance of IConverter interface
    IConverter doc = ConverterLocator.Converter(FilePath, FileToSave);

    //Call the Converter class and set th test of editor to converted excel.
    editor.Text = doc.Convert().ToString().Replace("�", "");
}

For demo purpose I created a word document and converted it to html file using Microsoft Word Interop.  

Here is the word document file, that I created for demo.

Image 2

and here is the converted HTML and I am displaying the converted HTML in FCKeditor.

Image 3

Summary

So you have been walk through how to convert the Microsoft word document into HTML document and displaying the result in Browser. By using Interop API you can perform several type of works like generating document, Excel sheets on the fly using code. This demo just give the introduction of Microsoft interop API, you can perform much more complex thing.

Disclaimer 

The project is solely based on my self study, knowledge and research, not based on any other project. I have used Microsoft Office Interop Api  to write this article. I would like to tell you that this is not the best approach to run Microsoft office on web server, because it is not recommended by Microsoft. Instead they recommended Open XML to perform Microsoft Office related functionality on web server. With OpenXML you can do near about everything that you do with MS Word or MS Excel.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Nagarro Softwares
India India
I am vijay tanwar and i am a software engineer with passion of programming. I love to programming in c#, I love to warp up more and more things in few lines of code. my favirote languages are c# and javascript and both are fully object oriended. I always like to become the .net Architect.

Comments and Discussions

 
QuestionConverted subfile's folder name Pin
Member 1396442029-Aug-18 1:18
Member 1396442029-Aug-18 1:18 
QuestionNot compatible with VS2013 Pin
Member 1287854429-Nov-16 23:06
Member 1287854429-Nov-16 23:06 
Questionwhat about header footer in word file Pin
sweetdeepali17-Sep-15 20:33
sweetdeepali17-Sep-15 20:33 
QuestionWhat about embedded images? Pin
AlexWang20104-Dec-14 6:03
AlexWang20104-Dec-14 6:03 
Questionplease tell Microsoft Interop API to convert the .pptx and txt file to HTML Pin
Member 1052297815-Jan-14 23:09
Member 1052297815-Jan-14 23:09 
GeneralThanks for such wonderful article Pin
Member 1046839315-Dec-13 16:46
Member 1046839315-Dec-13 16:46 
GeneralHow to convert Excel file to HTML file using c# in VSTS2010 Pin
sathyatv28-Feb-13 20:07
sathyatv28-Feb-13 20:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.