Click here to Skip to main content
15,867,308 members
Articles / Mobile Apps / Windows Mobile
Tip/Trick

Convert RTF to Plain Text (Revised Again)

Rate me:
Please Sign up or sign in to vote.
4.91/5 (15 votes)
9 Apr 2016CPOL1 min read 266K   16   100
Handling for hex expressions and the trailing '}'

Introduction

Most solutions to convert RTF to plain text with pure T-SQL don't handle special characters like German umlauts and all the other special characters above ASCII(128) because they are not embedded in RTF tags but noted as escaped hex values. Also most of these solutions leave a trailing '}' at the end of the converted text. This revised procedure will solve these problems.

Background

Searching the web for a T-SQL procedure to convert RTF-formatted text to plain text, you'll find a lot of matches. Mainly, there are 2 methods described: the first one uses the RichtextCtrl control with the need to reconfigure SQL server settings to allow access to OLE/COM which might be a problem in environments with high security guidelines (e.g. http://www.experts-exchange.com/Database/MS-SQL-Server/Q_27633014.html). The second one will be found in some slightly different versions which all produce results with restrictions as described above (e.g. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90034).

Using the Code

Add the following SQL function to your database:

USE [<YourDatabaseNameHere>]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[RTF2Text]
(
    @rtf nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @Pos1 int;
    DECLARE @Pos2 int;
    DECLARE @hex varchar(316);
    DECLARE @Stage table
    (
        [Char] char(1),
        [Pos] int
    );

    INSERT @Stage
        (
           [Char]
         , [Pos]
        )
    SELECT SUBSTRING(@rtf, [Number], 1)
         , [Number]
      FROM [master]..[spt_values]
     WHERE ([Type] = 'p')
       AND (SUBSTRING(@rtf, Number, 1) IN ('{', '}'));

    SELECT @Pos1 = MIN([Pos])
         , @Pos2 = MAX([Pos])
      FROM @Stage;

    DELETE
      FROM @Stage
     WHERE ([Pos] IN (@Pos1, @Pos2));

    WHILE (1 = 1)
        BEGIN
            SELECT TOP 1 @Pos1 = s1.[Pos]
                 , @Pos2 = s2.[Pos]
              FROM @Stage s1
                INNER JOIN @Stage s2 ON s2.[Pos] > s1.[Pos]
             WHERE (s1.[Char] = '{')
               AND (s2.[Char] = '}')
            ORDER BY s2.[Pos] - s1.[Pos];

            IF @@ROWCOUNT = 0
                BREAK

            DELETE
              FROM @Stage
             WHERE ([Pos] IN (@Pos1, @Pos2));

            UPDATE @Stage
               SET [Pos] = [Pos] - @Pos2 + @Pos1 - 1
             WHERE ([Pos] > @Pos2);

            SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
        END

    SET @rtf = REPLACE(@rtf, '\pard', '');
    SET @rtf = REPLACE(@rtf, '\par', '');
    SET @rtf = STUFF(@rtf, 1, CHARINDEX(' ', @rtf), '');

    WHILE (Right(@rtf, 1) IN (' ', CHAR(13), CHAR(10), '}'))
      BEGIN
        SELECT @rtf = SUBSTRING(@rtf, 1, (LEN(@rtf + 'x') - 2));
        IF LEN(@rtf) = 0 BREAK
      END
    
    SET @Pos1 = CHARINDEX('\''', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            IF @Pos1 > 0
                BEGIN
                    SET @hex = '0x' + SUBSTRING(@rtf, @Pos1 + 2, 2);
                    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), _
CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
                    SET @Pos1 = CHARINDEX('\''', @rtf);
                END
        END

    SET @rtf = @rtf + ' ';

    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            SET @Pos2 = CHARINDEX(' ', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                SET @Pos2 = CHARINDEX('\', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                BEGIN
                    SET @rtf = SUBSTRING(@rtf, 1, @Pos1 - 1);
                    SET @Pos1 = 0;
                END
            ELSE
                BEGIN
                    SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
                    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);
                END
        END

    IF RIGHT(@rtf, 1) = ' '
        SET @rtf = SUBSTRING(@rtf, 1, LEN(@rtf) -1);

    RETURN @rtf;
END

When copying the above code to SQL don't forget to remove the underscore (wich is only required in codeproject to break long lines)!

To convert any RTF-formatted content, call the function above passing the RTF content as parameter of type nvarchar(max):

SQL
SELECT [<YourRTFColumnNameHere>]
     , [dbo].[RTF2Text]([<YourRTFColumnNameHere>]) AS [TextFromRTF]
  FROM [dbo].[<YourDatabaseNameHere>]

The function returns the converted text as nvarchar(max) too.

More improvements may be added. If you find any RTF part that isn't covered by the function above, please drop a line here.

Thanks

Thanks to all the authors in the web that have posted their solutions until now and therefore deserve the applause. I simply enhanced these solutions to complete the basic conversion.

Thanks also to all users here posting their tips to make the procedure more robust.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Germany Germany
30+ years experience as developer with VB.NET, VB, VBA, VBScript, C#, WPF, WinForms, JavaScript, jQuery, PHP, Delphi, ADO, ADO.NET, ASP.NET, Silverlight, HTML, CSS, XAML, XML, T-SQL, MySQL, MariaDb, MS-ACCESS, dBase, OLE/COM, ActiveX, SEPA/DTAUS, ZUGFeRD, DATEV Format and DATEVconnect, DSGVO, TNT Web-API, MS-Office Addins, etc., including:
- 10+ years experience as developer and freelancer
- 10+ years experience as team leader
- 13+ years experience with CRM solutions

Comments and Discussions

 
QuestionAmazing!... with one caveat Pin
StevePilon12-Jan-24 5:34
StevePilon12-Jan-24 5:34 
QuestionAmazing Solution!! Pin
Member 1615255530-Nov-23 9:10
Member 1615255530-Nov-23 9:10 
QuestionI see chinese characters instead of my text Pin
Member 160446645-Jul-23 22:16
Member 160446645-Jul-23 22:16 
AnswerRe: I see chinese characters instead of my text Pin
NightWizzard7-Jul-23 0:12
NightWizzard7-Jul-23 0:12 
GeneralRe: I see chinese characters instead of my text Pin
Member 1604466411-Jul-23 3:32
Member 1604466411-Jul-23 3:32 
QuestionGreat code but fails when there is text in between { and } as in my example below. Could you help? Pin
sureshppaul15-Mar-21 0:09
sureshppaul15-Mar-21 0:09 
First of all thanks for the nice code as it solves the problem in almost 80% of the case.
But many cases it fails and I think the reason is that there is text between the curley brackets { and } but I think this code assumes it is all formatting and removes all of them (please forgive if I'm wrong).

An example is given below

{\rtf1\ansi\deff0\uc1\ansicpg1252\deftab254{\fonttbl{\f0\fnil\fcharset1 Arial;}{\f1\fnil\fcharset2 Wingdings;}{\f2\fnil\fcharset2 Symbol;}}{\colortbl\red0\green0\blue0;\red255\green0\blue0;\red0\green128\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red128\green0\blue128;\red128\green0\blue0;\red0\green255\blue0;\red0\green255\blue255;\red0\green128\blue128;\red0\green0\blue128;\red255\green255\blue255;\red192\green192\blue192;\red128\green128\blue128;\red0\green0\blue0;}\wpprheadfoot1\paperw12240\paperh15840\margl1880\margr1880\margt1440\margb1440\headery720\footery720\endnhere\sectdefaultcl{\*\generator WPTools_7.270;}{\plain\fs22\cf0\b Alarm Clause : \par  \pard\plain\plain\fs22\cf0\par  \plain\fs20\cf0 This insurance is issued subject to the condition that the premises shown on the policy schedule is equipped with a burglary alarm linked to a central station.\par  \plain\fs20\cf0  \par  \plain\fs20\cf0 You are obliged to:\par  \pard\plain\plain\fs20\cf0\par  \plain\fs20\cf0 1.\tab Have a service agreement with a security company. This service agreement needs to be in\par  \plain\fs20\cf0\tab place throughout the term of this insurance policy. \par  \pard\plain\plain\fs20\cf0\par  \plain\fs20\cf0 2.\tab Maintain the security system in a working state and use it. \par  \plain\fs20\cf0  \par  \plain\fs20\cf0 3.\tab Take all possible measures to avoid theft loss or damage during the period of a fault of the\par  \plain\fs20\cf0\tab system. \par  \pard\plain\plain\fs20\cf0\par  \plain\fs20\cf0 Should you fail to comply with the above mentioned measures, you  will lose your right to indemnity under this policy.\par  }}


If we put this in notepad and then change the extension to .rtf, it will show the below content

Alarm Clause : 
 
 This insurance is issued subject to the condition that the premises shown on the policy schedule is equipped with a burglary alarm linked to a central station.
  
 You are obliged to:
 
 1.	Have a service agreement with a security company. This service agreement needs to be in
 	place throughout the term of this insurance policy. 
 
 2.	Maintain the security system in a working state and use it. 
  
 3.	Take all possible measures to avoid theft loss or damage during the period of a fault of the
 	system. 
 
 Should you fail to comply with the above mentioned measures, you  will lose your right to indemnity under this policy.


But when we use the function it gives NULL as output

And going through the code what I understood is, the code is trying to remove everything between consecuting curley brackets which causes the actual text to be removed in above case.

Could you please suggest a solution for this?
QuestionMessage Closed Pin
28-Feb-21 11:21
runbikelive28-Feb-21 11:21 
QuestionStruggling with embeded urls Pin
SparkythePilgrim9-Jun-20 15:34
SparkythePilgrim9-Jun-20 15:34 
QuestionStrips first word on non-RTF values Pin
Member 1469268914-Jan-20 4:54
Member 1469268914-Jan-20 4:54 
QuestionThis is great - thanks very much [+small suggestion] Pin
Member 1412753123-Jan-19 6:22
Member 1412753123-Jan-19 6:22 
QuestionA little help on this RTF Pin
Member 114390623-Dec-18 23:56
Member 114390623-Dec-18 23:56 
SuggestionVery nice component Pin
Mystcreater1-Aug-18 6:58
Mystcreater1-Aug-18 6:58 
QuestionThe code does not process group {} Pin
Win32nipuh29-Apr-18 23:10
professionalWin32nipuh29-Apr-18 23:10 
QuestionGreat, but problem with unicode Pin
Win32nipuh16-Apr-18 4:01
professionalWin32nipuh16-Apr-18 4:01 
Questionoutput look like chinese.. Pin
sieb12316-Apr-18 4:04
sieb12316-Apr-18 4:04 
QuestionBest approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Code_1_Dreamer22-Mar-18 10:47
Code_1_Dreamer22-Mar-18 10:47 
AnswerRe: Best approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Win32nipuh16-Apr-18 4:02
professionalWin32nipuh16-Apr-18 4:02 
QuestionSome parsing errors when using line breaks /line Pin
Norbert Haberl7-Mar-18 4:44
Norbert Haberl7-Mar-18 4:44 
AnswerRe: Some parsing errors when using line breaks /line Pin
Win32nipuh24-Apr-18 21:42
professionalWin32nipuh24-Apr-18 21:42 
Questioncompatibility with SQL 11.0..... Pin
calwil6-Feb-18 8:56
calwil6-Feb-18 8:56 
QuestionSome cases in which it fails Pin
Member 132487038-Jun-17 6:46
Member 132487038-Jun-17 6:46 
AnswerRe: Some cases in which it fails Pin
NightWizzard8-Jun-17 23:01
NightWizzard8-Jun-17 23:01 
AnswerRe: Some cases in which it fails Pin
blhf24-Apr-18 17:15
blhf24-Apr-18 17:15 
AnswerRe: Some cases in which it fails Pin
ronschuster9-Jul-19 7:04
ronschuster9-Jul-19 7:04 
Questionbullets and table Pin
Syed Sumair19-Mar-17 3:27
Syed Sumair19-Mar-17 3:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.