Click here to Skip to main content
15,867,308 members
Articles / Mobile Apps / Windows Mobile
Tip/Trick

Convert RTF to Plain Text (Revised Again)

Rate me:
Please Sign up or sign in to vote.
4.91/5 (15 votes)
9 Apr 2016CPOL1 min read 266K   16   100
Handling for hex expressions and the trailing '}'

Introduction

Most solutions to convert RTF to plain text with pure T-SQL don't handle special characters like German umlauts and all the other special characters above ASCII(128) because they are not embedded in RTF tags but noted as escaped hex values. Also most of these solutions leave a trailing '}' at the end of the converted text. This revised procedure will solve these problems.

Background

Searching the web for a T-SQL procedure to convert RTF-formatted text to plain text, you'll find a lot of matches. Mainly, there are 2 methods described: the first one uses the RichtextCtrl control with the need to reconfigure SQL server settings to allow access to OLE/COM which might be a problem in environments with high security guidelines (e.g. http://www.experts-exchange.com/Database/MS-SQL-Server/Q_27633014.html). The second one will be found in some slightly different versions which all produce results with restrictions as described above (e.g. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90034).

Using the Code

Add the following SQL function to your database:

USE [<YourDatabaseNameHere>]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[RTF2Text]
(
    @rtf nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @Pos1 int;
    DECLARE @Pos2 int;
    DECLARE @hex varchar(316);
    DECLARE @Stage table
    (
        [Char] char(1),
        [Pos] int
    );

    INSERT @Stage
        (
           [Char]
         , [Pos]
        )
    SELECT SUBSTRING(@rtf, [Number], 1)
         , [Number]
      FROM [master]..[spt_values]
     WHERE ([Type] = 'p')
       AND (SUBSTRING(@rtf, Number, 1) IN ('{', '}'));

    SELECT @Pos1 = MIN([Pos])
         , @Pos2 = MAX([Pos])
      FROM @Stage;

    DELETE
      FROM @Stage
     WHERE ([Pos] IN (@Pos1, @Pos2));

    WHILE (1 = 1)
        BEGIN
            SELECT TOP 1 @Pos1 = s1.[Pos]
                 , @Pos2 = s2.[Pos]
              FROM @Stage s1
                INNER JOIN @Stage s2 ON s2.[Pos] > s1.[Pos]
             WHERE (s1.[Char] = '{')
               AND (s2.[Char] = '}')
            ORDER BY s2.[Pos] - s1.[Pos];

            IF @@ROWCOUNT = 0
                BREAK

            DELETE
              FROM @Stage
             WHERE ([Pos] IN (@Pos1, @Pos2));

            UPDATE @Stage
               SET [Pos] = [Pos] - @Pos2 + @Pos1 - 1
             WHERE ([Pos] > @Pos2);

            SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
        END

    SET @rtf = REPLACE(@rtf, '\pard', '');
    SET @rtf = REPLACE(@rtf, '\par', '');
    SET @rtf = STUFF(@rtf, 1, CHARINDEX(' ', @rtf), '');

    WHILE (Right(@rtf, 1) IN (' ', CHAR(13), CHAR(10), '}'))
      BEGIN
        SELECT @rtf = SUBSTRING(@rtf, 1, (LEN(@rtf + 'x') - 2));
        IF LEN(@rtf) = 0 BREAK
      END
    
    SET @Pos1 = CHARINDEX('\''', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            IF @Pos1 > 0
                BEGIN
                    SET @hex = '0x' + SUBSTRING(@rtf, @Pos1 + 2, 2);
                    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), _
CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
                    SET @Pos1 = CHARINDEX('\''', @rtf);
                END
        END

    SET @rtf = @rtf + ' ';

    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            SET @Pos2 = CHARINDEX(' ', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                SET @Pos2 = CHARINDEX('\', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                BEGIN
                    SET @rtf = SUBSTRING(@rtf, 1, @Pos1 - 1);
                    SET @Pos1 = 0;
                END
            ELSE
                BEGIN
                    SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
                    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);
                END
        END

    IF RIGHT(@rtf, 1) = ' '
        SET @rtf = SUBSTRING(@rtf, 1, LEN(@rtf) -1);

    RETURN @rtf;
END

When copying the above code to SQL don't forget to remove the underscore (wich is only required in codeproject to break long lines)!

To convert any RTF-formatted content, call the function above passing the RTF content as parameter of type nvarchar(max):

SQL
SELECT [<YourRTFColumnNameHere>]
     , [dbo].[RTF2Text]([<YourRTFColumnNameHere>]) AS [TextFromRTF]
  FROM [dbo].[<YourDatabaseNameHere>]

The function returns the converted text as nvarchar(max) too.

More improvements may be added. If you find any RTF part that isn't covered by the function above, please drop a line here.

Thanks

Thanks to all the authors in the web that have posted their solutions until now and therefore deserve the applause. I simply enhanced these solutions to complete the basic conversion.

Thanks also to all users here posting their tips to make the procedure more robust.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Germany Germany
30+ years experience as developer with VB.NET, VB, VBA, VBScript, C#, WPF, WinForms, JavaScript, jQuery, PHP, Delphi, ADO, ADO.NET, ASP.NET, Silverlight, HTML, CSS, XAML, XML, T-SQL, MySQL, MariaDb, MS-ACCESS, dBase, OLE/COM, ActiveX, SEPA/DTAUS, ZUGFeRD, DATEV Format and DATEVconnect, DSGVO, TNT Web-API, MS-Office Addins, etc., including:
- 10+ years experience as developer and freelancer
- 10+ years experience as team leader
- 13+ years experience with CRM solutions

Comments and Discussions

 
AnswerRe: Big thanks Pin
NightWizzard10-Mar-17 4:43
NightWizzard10-Mar-17 4:43 
QuestionInvalid length parameter passed to the LEFT or SUBSTRING function FIX Pin
Member 1292776529-Dec-16 17:45
Member 1292776529-Dec-16 17:45 
AnswerRe: Invalid length parameter passed to the LEFT or SUBSTRING function FIX Pin
NightWizzard29-Dec-16 21:59
NightWizzard29-Dec-16 21:59 
QuestionConvert data type error Pin
User 1262426927-Jul-16 22:14
User 1262426927-Jul-16 22:14 
AnswerRe: Convert data type error Pin
NightWizzard27-Jul-16 23:38
NightWizzard27-Jul-16 23:38 
GeneralRe: Convert data type error Pin
User 1262426928-Jul-16 1:04
User 1262426928-Jul-16 1:04 
GeneralRe: Convert data type error Pin
NightWizzard28-Jul-16 7:29
NightWizzard28-Jul-16 7:29 
AnswerRe: Convert data type error Pin
deafsquad19-Jan-17 4:12
deafsquad19-Jan-17 4:12 
there is one point where special chars get converted with the binary code that rtf is using to save them
SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));

i have a shitload of data and rtf editors easily malform things when you play around so i needed to insert following replace before the above line
SET @rtf = REPLACE(@rtf,'\\''','');

there have been duplicate items like \'\' in my rtf code that resulted in the convert error, this resolved that issue

then i got into another problem when the \' has only one letter after it because it was truncated for what so ever reason then the @hex variable holds only three letters like 0xc, that also triggers the conversion error, this was resolved with the following code
if len(@hex) = 4
  BEGIN
    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
  END
else
  BEGIN
    BREAK
  END


modified 20-Jan-17 2:02am.

GeneralRe: Convert data type error Pin
NightWizzard19-Jan-17 7:20
NightWizzard19-Jan-17 7:20 
Questiongetting error Pin
Member 1191328327-Jul-16 5:20
Member 1191328327-Jul-16 5:20 
AnswerRe: getting error Pin
NightWizzard27-Jul-16 7:17
NightWizzard27-Jul-16 7:17 
AnswerRe: getting error Pin
zapp44216-Sep-16 1:44
zapp44216-Sep-16 1:44 
GeneralRe: getting error Pin
NightWizzard16-Sep-16 5:39
NightWizzard16-Sep-16 5:39 
GeneralRe: getting error Pin
Member 119132835-Oct-16 4:59
Member 119132835-Oct-16 4:59 
GeneralRe: getting error Pin
NightWizzard5-Oct-16 7:10
NightWizzard5-Oct-16 7:10 
GeneralRe: getting error Pin
Member 119132835-Oct-16 4:59
Member 119132835-Oct-16 4:59 
QuestionConverting RTF to plain text (for RTF documents that can be read by Microsoft Word) Pin
Member 1245022910-Apr-16 15:19
Member 1245022910-Apr-16 15:19 
AnswerRe: Converting RTF to plain text (for RTF documents that can be read by Microsoft Word) Pin
NightWizzard10-Apr-16 22:15
NightWizzard10-Apr-16 22:15 
AnswerRe: Converting RTF to plain text (for RTF documents that can be read by Microsoft Word) Pin
NightWizzard11-Apr-16 6:34
NightWizzard11-Apr-16 6:34 
QuestionThere are some \f1 still left over Pin
ozz.project9-Apr-16 4:44
ozz.project9-Apr-16 4:44 
AnswerRe: There are some \f1 still left over Pin
NightWizzard9-Apr-16 6:32
NightWizzard9-Apr-16 6:32 
AnswerRe: There are some \f1 still left over Pin
NightWizzard11-Apr-16 6:33
NightWizzard11-Apr-16 6:33 
QuestionMany thanks! Pin
chaprot1-Apr-16 5:49
chaprot1-Apr-16 5:49 
AnswerRe: Many thanks! Pin
NightWizzard1-Apr-16 6:01
NightWizzard1-Apr-16 6:01 
Questionfine work! Pin
Member 123751417-Mar-16 3:44
Member 123751417-Mar-16 3:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.