Introduction
This article will explain an easy, robust way to convert rich text to HTML using VB.NET and Microsoft Office Automation.
Background
This all started out because I needed to take the contents of a RichTextBox
in an application I had developed and insert it into the body of an email. We're a Microsoft shop all around, so I could depend on Outlook 2007 to be the email client for all users, and I assumed (poorly) that I would be able to insert rich text into an Outlook email with little or no problem. Silly me.
Once I figured out that Outlook did not support rich text, even though it was using Word as its editor, I set about trying to convert my RTF to HTML, and I assumed (again) that there must be some simple straightforward way to do it without parsing all the RTF and accounting for each and every formatting tag myself. An exhaustive search of the internet turned up several third party apps; some of them were free, most of them parsed the RTF and seemed to be a little incomplete, and none of them really fit the bill when it came to simplicity.
I started fooling around with Office automation, thinking that if Microsoft didn't supply direct access to their RTF to HTML conversion process, perhaps they would supply indirect access. Sure enough, after fiddling around with Word for a while, I was able to figure out how to use Word as a translator and convert RTF directly to HTML in one short function. So here, for the assistance of all the other wage slaves out there struggling with a similar problem, is how I did it. Nothing earth shattering here, but a very handy function to have in your back pocket.
Using the Code
Basically, just throw this function into your VB.NET project. You'll need to include a reference to the Microsoft Word 12.0 Object Library (COM object). Other Word libraries may do just as well, but this is how I've used it.
Public Function sRTF_To_HTML(ByVal sRTF As String) As String
Dim MyWord As Microsoft.Office.Interop.Word.Application
Dim oDoNotSaveChanges As Object = _
Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges
Dim sReturnString As String = ""
Dim sConvertedString As String = ""
Try
‘set visible to false and create a document
MyWord = CreateObject("Word.application")
MyWord.Visible = False
MyWord.Documents.Add()
Dim doRTF As New System.Windows.Forms.DataObject
doRTF.SetData("Rich Text Format", sRTF)
Clipboard.SetDataObject(doRTF)
MyWord.Windows(1).Selection.Paste()
MyWord.Windows(1).Selection.WholeStory()
MyWord.Windows(1).Selection.Copy()
sConvertedString = _
Clipboard.GetData(System.Windows.Forms.DataFormats.Html)
sConvertedString = _
sConvertedString.Substring(sConvertedString.IndexOf("<html"))
sConvertedString = sConvertedString.Replace("Â", "")
sReturnString = sConvertedString
If Not MyWord Is Nothing Then
MyWord.Quit(oDoNotSaveChanges)
MyWord = Nothing
End If
Catch ex As Exception
If Not MyWord Is Nothing Then
MyWord.Quit(oDoNotSaveChanges)
MyWord = Nothing
End If
MsgBox("Error converting Rich Text to HTML")
End Try
Return sReturnString
End Function
Dim myotl As Microsoft.Office.Interop.Outlook.Application
Dim myMItem As Microsoft.Office.Interop.Outlook.MailItem
myotl = CreateObject("Outlook.application")
myMItem = myotl.CreateItem(Microsoft.Office.Interop.Outlook.OlItemType.olMailItem)
myMItem.Subject =
"This email was converted from rich text to HTML using a simple function in VB.net"
myMItem.Display(False)
myMItem.BodyFormat = Microsoft.Office.Interop.Outlook.OlBodyFormat.olFormatHTML
myMItem.HTMLBody = sConvertedString
Points of Interest
One word of warning, the HTML produced by this conversion process is very verbose. It produces a lot of lines of HTML for some very basic formatting, but it has performed error free conversion on thousands of pages of data thus far here where I work.
I am still surprised that Microsoft does not simply have RTF to HTML conversion functionality readily available in its development libraries. It seems like a logical and intuitive function to provide. Still, at least, there's a workaround.