Click here to Skip to main content
15,915,501 members
Articles / Web Development / ASP.NET
Article

Convert HTML to MHTML using ASP.NET

Rate me:
Please Sign up or sign in to vote.
4.06/5 (14 votes)
14 Jun 20041 min read 372.8K   4.5K   70   74
An article on how to convert a html document with images to a mhtml document

Image 1

Introduction

Ever wanted to make a report out of an html document and have it sent to the client for offline use in Word or Excel? An RFC - compliant Multipart MIME Message (mhtml web archive) is one single file containing all related material such as linked documents and images serialized to their Base64 inline encoding representations. There is no native support for creating mhtml archives in .NET but thanks to the Windows CDO library this is easy accomplished.

The code

The projects contains 3 classes; mht, mhtImage and mhtImageCollection. The mht class contains the conversion functions like convertWebControlToMHTString which takes a webControl and a collection of images and returns a string representation of the created mht archive. Use this function when the converting webControl is dependent on user specific Session and Application variables for rendering or when you use dynamically created images.

VB
Public Function convertWebControlToMHTString(ByVal control As WebControl, _
  ByVal MHTimages As mhtImageCollection) As String
  'Render WebControl to html
  Dim html As String = getHtml(control)

  'If WebControl has images, make the html Word compatible
  If Not MHTimages Is Nothing Then
    fixImageLocation(html, MHTimages)
  End If

  Dim msg As New CDO.MessageClass
  Dim stm As ADODB.Stream = Nothing
  Dim MS As System.IO.MemoryStream = Nothing

  Dim iBp As CDO.IBodyPart

  'Make a multipart mhtml document
  Dim mainBody As CDO.IBodyPart
  mainBody = msg
  mainBody.ContentMediaType = "multipart/related"

  'Make the html part of the document
  iBp = mainBody.AddBodyPart()
  iBp.ContentMediaType = "text/html"
  iBp.ContentTransferEncoding = "quoted-printable"
  stm = iBp.GetDecodedContentStream
  stm.WriteText(html)
  stm.Flush()

  'Make the image parts of the document
  If Not MHTimages Is Nothing Then
    Dim oMhtImage As mhtImage
    For Each oMhtImage In MHTimages
      iBp = mainBody.AddBodyPart()
      With iBp
        .ContentMediaType = "image/" + _
 oMhtImage.ImageFormat.ToString().ToLower()
        .ContentTransferEncoding = "base64"

        'ContentLocation must be the same as in the 
        'html part to make them linked
        .Fields.Append("urn:schemas:mailheader:content-location", _
    DataTypeEnum.adBSTR, , , oMhtImage.ContentLocation)
        .Fields.Update()
        .Fields.Refresh()
      End With

      Try
        MS = New System.IO.MemoryStream
        oMhtImage.Image.Save(MS, oMhtImage.ImageFormat)
        Dim bytearray As Byte() = MS.ToArray()
        stm = iBp.GetDecodedContentStream
        stm.Write(bytearray)
        stm.Flush()
      Finally
        MS.Close()
        stm.Close()
      End Try
    Next
  End If

  stm = mainBody.GetStream()
  Return stm.ReadText(stm.Size)
End Function

The convertWebPageToMHTString function converts an html document from a specific URL to a mht archive, all images included. Use this function for public html documents not dependent on user specific Session and Application variables.

VB
Public Function convertWebPageToMHTString(ByVal url As String) As String
    Dim msg As New CDO.MessageClass
    Dim stm As ADODB.Stream = Nothing

    Try
        msg.MimeFormatted = True
        msg.CreateMHTMLBody(url, CDO.CdoMHTMLFlags.cdoSuppressNone, "", "")
        stm = msg.GetStream()
        Return stm.ReadText(stm.Size)
    Finally
        stm.Close()
    End Try
End Function

The fixImageLocation appends the string "http://" at the beginning of each ContentLocation if not already there, for Word compliance

VB.NET
Private Sub fixImageLocation( _
      ByRef html As String, ByRef MHTimages As mhtImageCollection)
    Dim curContentLocation As String
    Dim curIndex As Integer
    Dim oMhtImage As mhtImage
    For Each oMhtImage In MHTimages
        curContentLocation = oMhtImage.ContentLocation
        If curContentLocation.IndexOf(":") = -1 Then
            curIndex = html.IndexOf(curContentLocation)
            While curIndex <> -1
                html = html.Insert(curIndex, "http://")
                curIndex = html.IndexOf(curContentLocation, curIndex + _
   curContentLocation.Length)
            End While
            oMhtImage.ContentLocation = "http://" + curContentLocation
        End If
    Next
End Sub

The mhtImage class contains image information. Property Image contains the actual image. Property ContentLocation contains the path to the image, must be exactly the same as the source for the image in the html part. Property ImageFormat contains the image format (jpg, gif, bmp...)

The mhtImageCollection class contains a collection of mhtImages.

Using the code

Example on how to make a mht archive from a Panel webControl containing one image.

VB
Dim oMhtCol As New mhtImageCollection
oMhtCol.add(New mhtImage(System.Drawing.Image.FromFile( _
  Server.MapPath("/mhtml/images/myComputer.jpg")), _
  "images/myComputer.jpg", System.Drawing.Imaging.ImageFormat.Jpeg))
sendMHTFile(ConvertWebControlToMHTString(Panel1, oMhtCol), "myFirstMht.mht")

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Sweden Sweden
Software developer

Comments and Discussions

 
GeneralRe: How to write the line &quot;Fields.Append...&quot; in c# Pin
Beginner_C#17-Aug-06 5:19
Beginner_C#17-Aug-06 5:19 
GeneralRe: How to write the line &quot;Fields.Append...&quot; in c# Pin
3R1CK_MX22-Dec-06 6:58
3R1CK_MX22-Dec-06 6:58 
GeneralConvert to MHT in C# Pin
shaanr30-Sep-04 10:01
shaanr30-Sep-04 10:01 
Generalembed images in Word 2000 Pin
Anonymous24-Aug-04 2:13
Anonymous24-Aug-04 2:13 
GeneralRe: embed images in Word 2000 Pin
Magnus_14-Sep-04 2:51
Magnus_14-Sep-04 2:51 
GeneralRe: embed images in Word 2000 Pin
Magnus_16-Oct-04 1:24
Magnus_16-Oct-04 1:24 
GeneralProblem running Example Pin
WorthR18-Aug-04 13:04
WorthR18-Aug-04 13:04 
GeneralError trying to build/run examples Pin
WorthR18-Aug-04 13:00
WorthR18-Aug-04 13:00 
This is very good stuff!

I'd like to adapt it, to be able to generate an MHT archive by adding multiple HTML pages. To get started, I copied some HTML files into a "Reports" subdirectory, and changed the button code as follows:

Private Sub ButtonConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ButtonConvert.Click
Dim MHTString As String
MHTString = convertWebPageToMHTString("http://localhost/mhtml/reports/Weekly Counts.html")
'HTMLString = HTMLString + convertWebPageToMHTString("http://localhost/mhtml/reports/JMSReports1.html") (I'll uncomment this once I get the first one to work)
sendMHTFile(MHTString, "myFirstMht.mht")
End Sub

My thought was to create an Add method to be able to append multiple HTML pages to an MHT archive. Unfortunately, I'm having trouble getting started. I can't get the project to build and run. It builds ok, but when it executes, I get the following error:

===========================================================================

Configuration Error
Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.

Parser Error Message: Access is denied: 'Interop.CDO'.

Source Error:


Line 196: <add assembly="System.EnterpriseServices, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
Line 197: <add assembly="System.Web.Mobile, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
Line 198: <add assembly="*">
Line 199:
Line 200:


Source File: c:\winnt\microsoft.net\framework\v1.1.4322\Config\machine.config Line: 198

Assembly Load Trace: The following information can be helpful to determine why the assembly 'Interop.CDO' could not be loaded.


=== Pre-bind state information ===
LOG: DisplayName = Interop.CDO
(Partial)
LOG: Appbase = file:///c:/inetpub/wwwroot/mhtml
LOG: Initial PrivatePath = bin
Calling assembly : (Unknown).
===

LOG: Policy not being applied to reference at this time (private, custom, partial, or location-based assembly bind).
LOG: Post-policy reference: Interop.CDO
LOG: Attempting download of new URL file:///C:/WINNT/Microsoft.NET/Framework/v1.1.4322/Temporary ASP.NET Files/mhtml/3958a62f/4eef8a67/Interop.CDO.DLL.
LOG: Attempting download of new URL file:///C:/WINNT/Microsoft.NET/Framework/v1.1.4322/Temporary ASP.NET Files/mhtml/3958a62f/4eef8a67/Interop.CDO/Interop.CDO.DLL.
LOG: Attempting download of new URL file:///c:/inetpub/wwwroot/mhtml/bin/Interop.CDO.DLL.
LOG: Policy not being applied to reference at this time (private, custom, partial, or location-based assembly bind).
LOG: Post-policy reference: Interop.CDO, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

==========================================================================

It looks like there's something I need to do to enable use of CDO.DLL, but I don't know what it is.

Thanks in advance for all help.
GeneralNice example Pin
Zygnus15-Aug-04 21:23
Zygnus15-Aug-04 21:23 
GeneralForms Authentication Pin
Noel H22-Jul-04 20:13
sussNoel H22-Jul-04 20:13 
GeneralRe: Forms Authentication Pin
Magnus_22-Jul-04 20:36
Magnus_22-Jul-04 20:36 
Generalconver mht to html Pin
Anonymous9-Jul-04 23:22
Anonymous9-Jul-04 23:22 
GeneralRe: conver mht to html Pin
Anonymous9-Jul-04 23:36
Anonymous9-Jul-04 23:36 
GeneralRe: conver mht to html Pin
geblack1-Jul-05 7:45
geblack1-Jul-05 7:45 
QuestionMHTML, help???? Pin
Member 12262359-Jul-04 3:17
Member 12262359-Jul-04 3:17 
AnswerRe: MHTML, help???? Pin
Magnus_9-Jul-04 12:13
Magnus_9-Jul-04 12:13 
GeneralRe: MHTML, help???? Pin
Member 122623514-Jul-04 1:22
Member 122623514-Jul-04 1:22 
GeneralRe: MHTML, help???? Pin
Magnus_14-Jul-04 7:34
Magnus_14-Jul-04 7:34 
Generalproblem with embedding the images Pin
ashish_sharma30-Jun-04 19:44
ashish_sharma30-Jun-04 19:44 
GeneralRe: problem with embedding the images Pin
Kalle Johansson1-Jul-04 23:39
Kalle Johansson1-Jul-04 23:39 
GeneralBack to HTML Pin
Gfw21-Jun-04 14:56
Gfw21-Jun-04 14:56 
GeneralRe: Back to HTML Pin
Magnus_23-Jun-04 9:42
Magnus_23-Jun-04 9:42 
GeneralRe: Back to HTML Pin
Anonymous15-Jul-04 23:54
Anonymous15-Jul-04 23:54 
GeneralRe: Back to HTML Pin
Anonymous23-Jul-04 11:11
Anonymous23-Jul-04 11:11 
GeneralRe: Back to HTML Pin
Anonymous24-Feb-05 5:49
Anonymous24-Feb-05 5:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.