Click here to Skip to main content
15,880,608 members
Articles / Programming Languages / C#
Article

The Shady Side of Rich Text

Rate me:
Please Sign up or sign in to vote.
4.94/5 (68 votes)
26 Apr 2007CPOL14 min read 138.4K   2K   143   45
Shading and Syntax Highlighting a Rich Text selection

Introduction

I developed an interest in Rich Text back in my Delphi days. There is certainly a much better support for Rich Text provided by the .NET Framework, but interestingly, text shading is not supported, and it was my interest in shading which prompted this article.

By shading I mean the constant width background shading used by MSDN and others to highlight code examples. Articles such as this also use shading, for the same purpose. Oddly, shading comes as standard with MS Word – even my 1997 version.

Shaded text is always presented in a mono-spaced font, such as Courier New or Lucida sans Typewriter, and non-shaded text, for contrast, is commonly presented in a proportionally spaced font.

For some time I have maintained what is essentially a customized C# help file, drawing on Code Project articles, MSDN, my own code (occasionally!) and others and I wanted to be able to implement shading whether or not it was used in the source material. I was also frustrated by the loss of formatting which can occur when a code snippet is ported across to a Rich Text Box and so this article started to take shape.

It turns out that the shading effect can be achieved quite simply, and I will explain how it can be done.

I will go on to show how to incorporate syntax highlighting – a bit more of a challenge but quite straightforward once one appreciates what needs to be done. The color scheme can be set to match your IDE settings, or anything else you might choose.

This article will show how to apply:

  • shading with no syntax highlighting,
  • syntax highlighting with no shading, and
  • shading with syntax highlighting

to a text selection in a Rich Text Box.

All is done with a single mouse click.

The approach

Regular Expressions are at the heart of this exercise. Though I use Regular Expressions in a very minor way in implementing shading, the technique is used extensively with syntax highlighting, and indeed it is hard to see how it would be possible to perform syntax highlighting without using Regular Expressions.

Those new to Regular Expressions would benefit from reading on if only to see how Regular Expressions are used in a real world example, rather than the contrived examples which textbooks commonly have no choice but to use.

Where to start?

This is what we are setting out to achieve:

Screenshot - ShadyImage1.gif

To be able to format a Rich Text selection we need to be able to edit the underlying escape sequences from which the Rich Text control rt1 will derive its formatted contents. That underlying version of the Rich Text is plain text and is easily edited, or at least it will be after we deal with some preliminaries.

Let us start by clearing rt1 and clicking Show/Hide RTF. t1 now holds the .Rtf version of what we see in rt1 (which of course is empty). The code is:

C#
t1.Text = rt1.Rtf;

The encoded, empty rt1 appears as:

{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Verdana;}}
\viewkind4\uc1\pard\lang1033\f0\fs17\par
}

You see that a Rich Text Box has at least one font defined. The first font in the font table will be identified as \f0 and in this case that font is Verdana, because that is the font property I have set for rt1.

You may define as many fonts as you like. If you specify a font which the system is unable to implement, it will (as far as I am aware) substitute the font defined as \f0, and failing that it would use whatever the system default is.

Although we will not need to build a font table, the following is what a multi-font table might look like:

{\fonttbl{\f0\fnil\fcharset0 Verdana;} {
    \f1\fnil\fcharset0 Lucida sans Typewriter;} {\f2\fnil\fcharset0 Arial;} {
    \f3\fnil\fcharset0 Terminal;} {…} {…}}

We can always find the font table, if we need to, because it starts in a defined way and terminates with double curly brackets:

{\fonttbl … … … … }}

More important than the font table, for our purposes, is the (optional) color table. The rt1 text, in the absence of any instruction to the contrary, will be black, the default text color. If you start typing, the text will certainly be black. (Note the absence of a color table in the case of an empty rt1, though had you set the text color property to other than black, you would see a color table with one entry).

We are going to need a color table for two reasons – we need to specify a shade color and we need at least four more colors for syntax highlighting.

I will show how to introduce a color table but first we must deal with a problem which will immediately arise. Colors are referenced according to their position in the color table. The escape sequences \cf1, \cf2, \cf3 … reference consecutive entries in the color table. (\cf0 is not explicitly defined, and can be regarded as black – though you are free to re-define say \cf7 as black also, if you wish. You can do what you like. Though there would be no point to it, you could define three differently indexed but identical green colors).

Our problem is that, having defined a color table, whenever we examine the background rt1 encoding, we will see that only those colors which have been invoked for the current rt1 selection appear in the table. Any unused or not yet used colors are no longer in the table, and the index values will have changed. My color table specifies \cf1 for black, \cf5 for green, \cf6 for blue and so on. I need to be able to color keywords blue by invoking an escape sequence which is synchronized with my nominated color table. I cannot allow blue to change to green because the table has been compromised.

The way around this is to utilize a second Rich Text Box rt2. (There are other reasons why it is advantageous to use a second Rich Text Box, unrelated to color referencing). You will see how this enables us to define an invariant color table.

Once the formatted text has been pasted back into rt1 we don't know (and don't care) how the color indexing might have changed.

Now we can start

A code snippet in rt1, having been selected, is programmatically copied to rt2. The first advantage in doing this is the rt1 selection will remain selected and we can later copy the formatted text back on top of that selection without having to re-select the text, which could be messy.

We will have set the rt2 font property to the mono-spaced font of our choice and we can now pad out each line to the intended width of the shading, which will give us a uniform right hand edge (the shading effect can be set to full width, or something less). The .Rtf version of rt2 will be copied to a string workstring and all processing will be done via workstring. (There is no reason for rt2 to ever be visible):

C#
private string ConstructWorkstring()
{
    if (rt1.SelectedText != "")
    {
        rt2.Text = rt1.SelectedText;

        // Pad out the rt2 text with spaces
        string bufferString = "";
        string[] bufferArray = rt2.Text.Split(LF);

        char padder = Space;

        int i = -1;
        while (i < ((bufferArray.Length - 2) - 1))  // All but the last line
        {
            i++;
            bufferString += bufferArray[i].PadRight(columns, padder) + LF;
        }

        // We don't want a LF tacked onto the last line
        i++;
        bufferString += bufferArray[i].PadRight(columns, padder);
        rt2.Text = bufferString;

        string workstring = rt2.Rtf;
        return workstring;
    }
    else
    {
        MessageBox.Show("You must select some text ...", " Error ...",
            MessageBoxButtons.OK);
        return "";
    }
}

We need a Color Table …

This is the format of a color table:

{\colortbl ;\red#\green#\blue#;\red#\green#\blue#;\red#\green#\blue#;… … …}

where # is a decimal number in the range 0 ... 255.

Presenting it a little differently, the color table looks like this (ignore the comments):

{\colortbl ;
\red000\green000\blue000;      … … \cf1  Black
\red128\green128\blue128;      … … \cf2  Gray
\red238\green238\blue238;      … … \cf3  Shade
…
…
}

… and here is the Color Table construct:

C#
// Add a color table
workstring = CreateColorTable(workstring);

(Although I know there is no existing color table, I will, for completeness, write code which will remove it if it does exist).

C#
private string CreateColorTable(string s)
{
    // Remove any existing Color Table ...
    string re = @"{\\colortbl .*;}";
    Regex r = new Regex(re);
    s = r.Replace
              (s,
              "");

    // ...  and insert a new one
    re = ";}}";
    r = new Regex(re);
    return s = r.Replace
                     (s,
                     re + @"{\colortbl ;" + colorDefinitions + @"}");
}

The second Replace operation locates the end of the font table and appends the color table. The colors making up colorDefinitions have been declared as string constants in the form \red#\green#\blue#;.

(Alternative means of creating a color table does not bear thinking about).

workstring now contains the plain text, encoded, version of the selected Rich Text, complete with font and color tables.

Applying Shade

Now we can shade the selected text. Remember, we are working with plain text now, and that text includes our chosen font table, color table and the rt1 text selection, together with the header escape sequences which we don't have to construct – they are supplied.

The new color table is stable because it is just plain text, and it will remain unchanged during our editing. When finished, the edited encoding will be copied back to rt1, where it will replace the (still current) text selection.

The escape sequence which will shade the selection is \highlight3 where the numeric is the index in my color table of the desired shade color (I have hardwired this, but the color index could be easily kept under program control).

(Remember, when running the demo, that if you do not have the nominated mono-spaced font, something else will be substituted and it will probably be proportionally spaced, so make sure you use Courier New or one of the other common mono-spaced fonts).

The following is the shading code:

C#
Regex r = new Regex(@"\\f0");
workstring = r.Replace(
                   workstring,
                   @"\f0" + @"\highlight3");

The escape sequence \highlight3 has the effect of setting the text background color to the third color in the color table. Specifically \f0\fs#<text /> is replaced by \f0\highlight3\fs#<text />. If you examine the encoded background, however, after the highlighting has been implemented, you will see that the color indexes (not the colors) have changed.

(You may have noticed that the backslash in the Regular Expression (\\f0) is escaped, but not the backslash in the Replacement. Take a moment to think about why this is necessary).

That is all there is to shading. To complete the operation, all we need do is:

C#
rt1.SelectedRtf = workstring;

(If you select some text in rt1 and invoke Shade only, you will find on clicking Show/Hide RTF that the escape sequence for shading appears as \highlight1, not \highlight3. Although the table index has changed (there is only one color defined in the table, the shade color, so its index is now 1), the correct color is still referenced. This is a nice example of what I was referring to earlier).

Syntax Highlighting

We can incorporate syntax highlighting in five or six simple steps.

We will, in this order, color Keywords, Class names, Characters, Literals and Comments. Two styles of comments - those which start with // and those which are in the form of blocks encased by /* and */ are catered for. I have ignored the /// construct because I don't need it. It would, however, be quite easy to include.

Keywords, Class names, Characters, Literals embedded in Comments (and Comments embedded in Literals) must not be highlighted and you will see how this is done. We get into a bit of a bind, however, dealing with Literals in Comments as against Comments in Literals so I include a small clean up routine which looks after that. I will show that code later.

Let us start with Keywords:

Highlighting Keywords

Because they are a discrete set, I store the C# Keywords as a resource string. The keywords are 77 in number and can be picked up here.

The keywords cannot be used in the form they are supplied, however. For example, the first three keywords in an alphabetic listing are abstract, as and base. The abstract part of abstraction, the as part of has and the base part of baseless would be highlighted. To prevent this happening, each entry in the keyword list is placed between a pair of word boundary tags \b. The entries are converted to \babstract\b, \bas\b, \bbase\b, and so on.

Further, to enable the keyword list to be scanned via a single Regular Expression, entries are OR'ed together with the | character. The converted resource string now looks like:

\babstract\b|\bas\b|\bbase\b| … … … |\bwhile\b

so the pattern to be matched, in plain language, is abstract OR as OR base OR … … OR while

The following code implements keyword highlighting:

C#
// Keyword
Regex r = new Regex(keywordList);
workstring = r.Replace(
                   workstring,
                   new MatchEvaluator(KeywordHandler));

Note that the replacement string is not directly specified. Each time a match occurs (each time a keyword is encountered) the match is passed to the MatchEvaluator(KeywordHandler) delegate:

C#
// Keyword Handler … Keyword Handler … Keyword Handler … Keyword Handler …
static string KeywordHandler(Match m)
{
    string keyword = m.ToString();
    return keyword = KY + keyword + TX;
}

KY translates to \cf6 (which is blue in my color table) and TX to \cf1, which is black. The effect is to color all keywords blue (including any embedded in comments and literals – they will be cleaned up later):

Consider the keyword public, for example. Wherever it appears in workstring it has been changed to \cf6 public\cf1, and so on for the other keywords.

We will leave Class names to one side for the moment, because it is not really possible to put together a complete list of Class names. I will suggest a compromise later.

The full code for syntax highlighting is as follows:

C#
private string SyntaxHighlight(string workstring)
{
    // Keyword
    Regex r = new Regex(keywordList);
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(KeywordHandler));
            
    //Class name
    r = new Regex(formattedClassList);
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(KeyclassHandler));

    // Character
    r = new Regex(@"'.?'");
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(CharacterHandler));

    // Literal
    r = new Regex(@"@?""[^""]*""");
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(LiteralHandler));

    // Comment (Type 1): // ... ... 
    r = new Regex(@"//.*\\par");
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(CommentHandler));

    // Comment (Type 2): /* ... ... */
    r = new Regex(@"/\*.*?\*/", RegexOptions.Singleline);
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(CommentHandler));

    // Any comments embedded in literals will have been
    // highlighted and we need to clean up such instances
    r = new Regex(@""".*/\*.*?\*/.*\\par");
    workstring = r.Replace(
                       workstring,
                       new MatchEvaluator(CleanupHandler));
            
    return workstring;
}

The delegates are straightforward and essentially reduced to inserting and removing \cf# escape sequences.

Highlighting Class Names

Though I speak of class names, you will see by looking at your IDE that words other than class names are also colored. The default highlight color for these other user types is the same as the default for class names. When I speak of class names, therefore, I should be understood to be including these other user types, because there is merit in highlighting them also.

Highlighting of class names, if it is considered worth doing, requires some thought. Though there are no programming difficulties, we are dealing with an open ended, undefined list. Further, how do we know whether a word is a class name?

There is no way that I can think of to define a class name list, as we are able to do with keywords. There could be thousands of library class names and, furthermore, a programmer can conjure up new classes and name them at will so we have to decide how we will handle this situation.

My solution is to have a class name list which is maintained in isolated storage and, whenever we see a class name not in the list, we add it to the list. Class names are easily recognized in code because, leaving aside the ones we are all familiar with, in my experience the Pascal naming convention is always used, and, were that rule to fail, context and usage will serve to identify class names.

In this demonstration program I maintain the class names list in the form: Classname2 Classname17 Classname5 Classname86 … … which is an alias for: \bClassname2\b|\bClassname17\b|\bClassname5| … … and so forth.

The first form I call the unformattedClassList and the second formattedClassList. The formatted list is synchronized to the unformatted list.

The unformatted list, which you can see in rt3, can be hand edited or it can be updated by simply double clicking a recognized class name in rt1. I also provide for sorting the list so that you may more easily look through it.

Similarly you can remove a class name from rt3 by double-clicking the name in rt3.

This way of handling class names has the advantage of simplicity and works well.

Some comments on the demo application

The demo application comes with a resource string which is loaded as the source material on the first run. You can experiment with and save your own source. Your source can be saved to isolated storage, so you never need to go looking for it. The isolated storage files are created on the first run, and I have included two buttons (T1 and T2) which enable you to examine the list of isolated storage files and to delete them, should you wish to force a "first run".

Though I have done quite rigorous testing, I cannot say that all permutations and combinations have been gone through, so I would be happy to deal with any issues which arise.

Regular Expressions

There is a wealth of material available on the internet – including many Code Project articles - and I have not found it necessary to buy a text, so I am not able to recommend one. To those new to regular expressions, they feature large in the Perl world and, although there are syntactic differences between their Perl and C# usage, the differences do not matter too much. The most comprehensive treatment of this topic, in my experience, is to be found in the Perl literature.

Rich Text Format

When I first got interested in rich text I bought the RTF Pocket Guide (O'Reilly) and it is my companion whenever I am wrestling with this topic. My edition was published in 2003 and I imagine it would still be in print. Its price at that time was $US 12.95 and I strongly recommend it.

Isolated Storage

Again, all that you need is available on the internet, and the topic has been well covered in Code Project and MSDN articles. Google will take you where you want to go.

Conclusion

It is likely that some readers will be looking at one or more of the three topics – Rich Text, Regular Expressions and Isolated Storage - for the first time and I hope the code shown here is of some value to them. I decided it was not practical to include in this article an explanation of the rather cryptic regular expressions I have used here. The length of the article would have doubled and it would perhaps still not have been an adequate treatment. I might submit at a later time an in depth article dealing with this extremely powerful tool.

Finally an illustration of a range of examples where syntax highlighting can be seen working. The list is not exhaustive:

Screenshot - ShadyImage2.gif

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Australia Australia
An old assembly language programmer (8051 microcontroller) and a recent convert from Delphi. Retired ten years ago and rather enjoy writing C# code.

Comments and Discussions

 
QuestionRTF->Plain text Pin
Win32nipuh2-Aug-12 5:58
professionalWin32nipuh2-Aug-12 5:58 
GeneralMy vote of 4 Pin
Amir Mehrabi-Jorshari2-Nov-10 2:28
Amir Mehrabi-Jorshari2-Nov-10 2:28 
GeneralOld Skool ! Pin
renyiace3-Jul-09 20:31
renyiace3-Jul-09 20:31 
GeneralRe: Old Skool ! Pin
Maurice Tarrant3-Jul-09 23:55
Maurice Tarrant3-Jul-09 23:55 
GeneralAwesome work Pin
Xmen Real 24-Jan-09 15:29
professional Xmen Real 24-Jan-09 15:29 
GeneralPerformance tweak: Stringbuilder Pin
kabwla24-Nov-07 23:37
kabwla24-Nov-07 23:37 
GeneralGreat Job Pin
Mike Hankey4-May-07 4:43
mveMike Hankey4-May-07 4:43 
GeneralRe: Great Job Pin
Maurice Tarrant4-May-07 15:14
Maurice Tarrant4-May-07 15:14 
GeneralRe: Great Job Pin
Mike Hankey4-May-07 17:08
mveMike Hankey4-May-07 17:08 
GeneralRe: Great Job Pin
Maurice Tarrant4-May-07 17:30
Maurice Tarrant4-May-07 17:30 
GeneralRe: Great Job Pin
Mike Hankey5-May-07 6:31
mveMike Hankey5-May-07 6:31 
GeneralGreat work... Pin
amisinai2-May-07 11:33
amisinai2-May-07 11:33 
GeneralRe: Great work... Pin
Maurice Tarrant2-May-07 14:42
Maurice Tarrant2-May-07 14:42 
GeneralRe: Great work... Pin
balakpn4-May-07 4:12
balakpn4-May-07 4:12 
GeneralRe: Great work... Pin
Maurice Tarrant4-May-07 15:10
Maurice Tarrant4-May-07 15:10 
GeneralRe: Great work... Pin
balakpn6-May-07 18:29
balakpn6-May-07 18:29 
Generalbug Pin
balakpn6-May-07 19:02
balakpn6-May-07 19:02 
GeneralRe: bug Pin
Maurice Tarrant6-May-07 21:38
Maurice Tarrant6-May-07 21:38 
GeneralRe: bug Pin
kabwla24-Nov-07 23:42
kabwla24-Nov-07 23:42 
GeneralGreat Job! Pin
merlin9811-May-07 4:36
professionalmerlin9811-May-07 4:36 
QuestionFascinating but why RTF? Pin
Peter Wone30-Apr-07 14:19
Peter Wone30-Apr-07 14:19 
AnswerRe: Fascinating but why RTF? Pin
Maurice Tarrant1-May-07 1:23
Maurice Tarrant1-May-07 1:23 
GeneralRe: Fascinating but why RTF? Pin
Peter Wone1-May-07 13:06
Peter Wone1-May-07 13:06 
I think your article struck a chord with me because I had to do precisely this in 2000. At the time I was working with Java, but my solution - incredibly similar to yours - was so successful it was immediately ported to (you guessed it) Delphi.

Although RTF and HTML look very different, in fact they are extremely similar. Both are markup languages from which a document object model is constructed. Both can specify attributes inline. Both can define named styles and use those to apply sets of attribute values.

In both cases it is smarter to use named styles than inline property values.

Now that I spell it out like this, it occurs to me that it's six of one and half a dozen of the other. It is slightly easier to extract the content of two HTML documents because it's delineated by ... and HTML has the particular advantage that you can move the styles out into separate files and manage them independently.

Probably the single most important difference is that the RTF widget in wide use (it's a COM object - the one Delphi uses is basically a wrapper around this COM object) screws up tables, whereas the multitude of HTML renderers all handle tables extremely well.

Maybe we should think about an RTF to HTML converter...?

PeterW
--------------------
If you can spell and use correct grammar for your compiler, what makes you think I will tolerate less?

AnswerRe: Fascinating but why RTF? Pin
BoneSoft2-May-07 5:18
BoneSoft2-May-07 5:18 
GeneralRe: Fascinating but why RTF? Pin
Peter Wone2-May-07 20:03
Peter Wone2-May-07 20:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.