Click here to Skip to main content
15,888,977 members

Comments by Cansid (Top 3 by date)

Cansid 23-Sep-15 5:05am View    
You can utilize the alternative chunk feature of DOCX files in order to achieve this.
You see the DOCX documents can have a certain placeholders (called "altChunks") that enable you to reference a HTML file which you can store inside the DOCX file itself.
You can read more about this and how to achieve this with OpenXML SDK on the following link:
How to Use altChunk for Document Assembly[^]

You can also find other approaches that do not use OpenXML SDK in order to import "altChunks":
Appending HTML and RTF content to the DOCX with MadMilkman.Docx[^]
HTML as a Source for a DOCX File[^]

But note that there are some drawbacks in this, until you open a document that contains altChunk elements in MS Word it will not have a "normal" (WordprocessingML markups) content because this approach itself does not convert html but rather relies on MS Word to do the conversion at the time when opening the document.

If you need a "real" convertion than you can try the approach from this article:
Convert HTML to / from Word document in C# and VB.NET[^]
This article uses a .NET library for Word processing[^].
Cansid 23-Sep-15 4:39am View    
Sharma I know this is probably too late to help you, but I want to add on the Sergey answer regarding the DOCX to TXT.

First in case you or anyone else is planning to use Office Interop on server side I would encourage you to really reconsider, it will definitely result in a lot of headaches...

Second for extracting the DOCX content into a TXT file is not that complicated (unlike in PDF files) and you can find the solutions that do not require any third parties (like Open XML SDK which is not a lightweight library at all...)

For example see this CodeProject's article:
Find Text in Word Documents[^]
It uses only System.IO.Packaging and System.Xml namespaces and all you need to do in order to use it is the following code (and the accompanied two classes DocxReader and DocxToStringConverter):

using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    string docxText = new DocxToStringConverter(stream).Convert();

    // Do something with DOCX text ...
}
Cansid 22-Sep-15 6:07am View    
Abdul here is something that I'm using:

string htmlText = null;
var inputOptions = LoadOptions.HtmlDefault;
var outputOptions = SaveOptions.PdfDefault;

using (var htmlStream = new MemoryStream(inputOptions.Encoding.GetBytes(htmlText)))
DocumentModel.Load(htmlStream, inputOptions)
.Save(this.Response, outputOptions);

The code I used is from this article of converting html into a pdf with C#. You may noticed that both input html and output pdf can be a physical file or a stream. Also you can notice that I used a Response with the save method, this directly exports pdf to asp.net client.