Click here to Skip to main content
15,072,267 members
Articles / General Programming / Regular Expressions
Article
Posted 18 Jan 2015

Stats

26.8K views
406 downloads
12 bookmarked

Text Template Transformation Engine / Code Generation Tool in C#

Rate me:
Please Sign up or sign in to vote.
4.80/5 (6 votes)
8 May 2021MIT10 min read
A simple drop-in function that provides T4 like template based text generation.

Introduction

This drop-in function provides a simple template based text generation engine. It basically allows code to be encapsulated in special tags that can manipulate or insert text. If you have worked with classic ASP, JSP, PHP or T4 templates, you're probably familiar with text template transformations. In classic ASP, code is wrapped in <%...%> for example:

ASP
<body>Current time:<br /><%Response.Write Now()%></body>

In T4, it is wrapped like:

ASP
Current time:<#= DateTime.Now #>

And finally in this project, we do it like:

ASP
Current time: [[=DateTime.Now ]]

or:

ASP
Current time: /*=DateTime.Now :*/

Basically text transformations, also called dynamic text, allows the use of programming methods to modify text. Common uses might be for repeating sections of text, filling in fields on an ASP page, showing a username or account code in an email, or to write 1 to 1000 on a webpage.

This project is similar to Microsoft's T4, but simpler. It is similar because it uses encapsulated C# code to inject text. Microsoft Visual Studio's T4 is more powerful and this project is not meant to be a replacement... at least not inside of Visual Studio! There are a couple of issues when using T4 templates in 3rd party applications. The foremost is the licensing. The T4 DLL is not redistributable. There are funky ways around this by installing some MS packages that have the DLL or installing Visual Studio express but that is messy. This project by contrast is not even a DLL, it is a simple drop in function. Another issue is T4 does not mesh well with many syntax highlighting and code completion projects. To work around this, I created \*: code-here :*\ like commands in comments. This allows this templating system to be used directly in C#/C++ files within causing havic on design time error checking.

Here are some transformation examples. The intermediate step just shows what gets executed to create the final output. This examples uses ]],[[,[[=,[[! as for encapsulating code.

Original Intermediate Step Final Output
1[[for(int i=0; i<9; i++){]]0[[}]] Write(“1”); for(int i=0; i<9; i++){ Write(“0”);} 1000000000
1[[~for(int i=0; i<9; i++)]]0 Write(“1”); for(int i=0; i<9; i++) Write(“0”); 1000000000
Printed [[=DateTime.Now]] Write(“Printed ”); Write(DateTime.Now); Printed 1/4/15 2:36PM
[[=i++]]. A [[=i++]]. B Write(i++);Write(“. A”);Write(i++); Write(“. B”); 1. A 2. B

 

Example of running the source files included: (The top part is the input to the function and the bottom is the output of the function... as simple ast that.)

Image 1

Background

This function was built because of a need for a simple text template transformation engine for an AMD GCN assembly language project I am working on. In assembly, a pre-compile, macro like feature is very useful – almost required. Very often, you might run into a situation like having to unroll a looped loop. Since pure assembly languages does not support unrolling of “for” or “while” like higher level languages, it is often left to the programmer to do these. Working with and maintaining a few ugly template code lines of code is much better than writing ten assembly statements fifty times.

For example:

C#
[[ for(int i = 0; i < 4; i++){ ]]
   Add R[[=i+20], R4, [[=i]]; [[}]]

Would be transformed into...

Add R20, R4, 0;
Add R21, R4, 1;
Add R22, R4, 2;
Add R23, R4, 3;

Originally, I was planning on using Microsoft’s T4 but after some investigation, I found it required a DLL that was not redistributable. It seemed pretty easy and fun to create a text transformation template engine so I set forth. The goal was to keep it as simple as possible because I might want to adopt it for different uses in the future and if there was lots of junk, then adjusting it would be difficult.

Using the Code

Just drop in the function or static class and then make a call to the function.

  1. First, copy the function into your application. Make the function public, private or internal as needed.
  2. Select the formatting you wish to use by uncomment the style in the header. There are two formats:
    1. [[CODE]] , [[=EXPRESSION]] , [[~FULL_LINE_OF_CODE, and [[!SKIP_ME]] - easier to read (recommended)
    2. /*:CODE:*/, /*=EXPRESSION:*/, //:FULL_LINE_OF_CODE, and //!SKIP_ME - works better with c-like code completion and syntax highlighting

    3. Or, create your own
  3. Build some text (as a string) that needs to be converted. Use the following table for reference:
      “[[..]]” Style “/*:..:*/” Style comments
    Code Block [[ code_here ]] /*: code_here :*/ normal usage
    Code Line [[~code_here //: code_here terminates with line break
    Expression [[=variable]] /*= variable:*/ wraps var in write(...)
    Comment Block [[! comments ]] /*! comments :*/ excluded in final
    Comment Line (none) //! comments ends with line break
    IDE Code Only (none) /**/ IDE code /**/ dummy/filler IDE only code
  4. Call Expand(...) in your application. It takes two string parameters. The first string parameter should have the input text with the encapsulated C# commands. The second string parameter will hold the results. Lastly, Expand() returns true if successful or false if there are any compiler error(s).
    Usage: bool success = Expand(myInput, out myOutput);

  5. Debugging: Compile-time errors, will be returned in the output parameter (instead of the results). The function will list each error with line and column information. Directly after the errors, the intermediate code will be displayed for reference. If you would like to go farther and correct runtime errors, just grab those contents of the program variable and copy and paste them into a new Visual Studio console project. There is an included Main() so the contents can just be dropped into a file and run.

How It Works

In a nutshell, the Expand() function takes a string, converts that string into a program (Step 1), compiles the program (Step 2), and finally runs that program to collect its output (Step 3).

Here is the entire code:

C#
public static bool Expand(string input, out string output)
{
    //////////////// Step 1 - Build the generator program ////////////////
    // For [[CODE]] , [[=EXPRESSION]] ,  [[~FULL_LINE_OF_CODE  &  [[!SKIP_ME]]
    // style uncomment the next 5 lines of code
    const string REG = @"(?<txt>.*?)" +      // grab any normal text
        @"(?<type>\[\[[!~=]?)" +             // get type of code block
        @"(?<code>.*?)" +                    // get the code or expression
        @"(\]\]|(?<=\[\[~[^\r\n]*?)\r\n)";   // terminate the code or expression
    const string NORM = @"[[", FULL = @"[[~", EXPR = @"[[=", TAIL = @"]]";

    //// For /*:CODE:*/ , /*=EXPRESSION:*/ , //:FULL_LINE_OF_CODE & //!SKIP_ME
    //// style uncomment the next 5 lines of code
    //const string REG ="(?<txt>.*?)" +          // grab any normal text
    //    @"((/\*\*/.*?/\*\*/)|(?<type>/(/!|\*:|\*=|/:|\*!))" + // get code 
    //    @"(?<code>.*?)" +                      // get the code or expression
    //    @"(:\*/|(?<=//[:|!][^\r\n]*)\r\n))";   // terminate the code or expression
    //const string NORM = @"/*:", FULL = @"//:", EXPR = @"/*=", TAIL = @":*/";               

    System.Text.StringBuilder prog = new System.Text.StringBuilder();
    prog.AppendLine(
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
    class T44Class { 
    static StringBuilder sb = new StringBuilder();
    public string Execute() {");
    foreach (System.Text.RegularExpressions.Match m in
        System.Text.RegularExpressions.Regex.Matches(input + NORM + TAIL, REG,
        System.Text.RegularExpressions.RegexOptions.Singleline))
    {
        prog.Append(" Write(@\"" + m.Groups["txt"].Value.Replace("\"", "\"\"") + "\");");
        string txt = m.Groups["code"].Value;
        switch (m.Groups["type"].Value)
        {
            case NORM: prog.Append(txt); break;  // text to be added
            case FULL: prog.AppendLine(txt); break;
            case EXPR: prog.Append(" sb.Append(" + txt + ");"); break;
        }
            }
            prog.AppendLine(
@"  return sb.ToString();}
static void Write<T>(T val) { sb.Append(val);}
static void Format(string format, params object[] args) { sb.AppendFormat(format,args);}
static void WriteLine(string val) { sb.AppendLine(val);}
static void WriteLine() { sb.AppendLine();} 
static void main() { Console.Write(sb.ToString());} }");
    string program = prog.ToString(); 

    //////////////// Step 2 - Compile the generator program ////////////////
    var res = (new Microsoft.CSharp.CSharpCodeProvider()).CompileAssemblyFromSource(
        new System.CodeDom.Compiler.CompilerParameters()
        {
            GenerateInMemory = true, // note: this is not "in memory"
            ReferencedAssemblies = { "System.dll", "System.Core.dll" } // for linq
        }
        , program);

    res.TempFiles.KeepFiles = false; //clean up files in temp folder

    // Print any errors with the source code and line numbers
    if (res.Errors.HasErrors)
    {
        int cnt = 1;
        output = "There is one or more errors in the template code:\r\n";
        foreach (System.CodeDom.Compiler.CompilerError err in res.Errors)
            output += "[Line " + err.Line + " Col " + err.Column + "] " + 
                        err.ErrorText + "\r\n";
        output += "\r\n================== Source (for debugging) =====================\r\n";
        output += "     0         10        20        30        40        50        60\r\n";
        output += "   1| " + System.Text.RegularExpressions.Regex.Replace(program, "\r\n",
            m => { cnt++; return "\r\n" + cnt.ToString().PadLeft(4) + "| "; });
        return false;
    }

    //////////////// Step 3 - Run the program to collect the output ////////////////
    var type = res.CompiledAssembly.GetType("T44Class");
    var obj = System.Activator.CreateInstance(type);
    output = (string)type.GetMethod("Execute").Invoke(obj, new object[] { });
    return true;
}

Step 1) Build the Generator Program - The input text, with embedded C# commands, is fed through a Regular Expression to parse out the different sections. The input text by nature is going to be in the format TEXT-CODE-TEXT-CODE… so we process each TEXT-CODE at a time. Here is the RegEx used for deciphering each TEXT-CODE:

  • (?<txt>.*?) <- This captures any normal text that will directly outputted with Write(“text here”).
  • (?<type>\[\[!|\[\[\~|\[\[|\[=) <- This gets the begin bracket and the type of it. It can be [[ , [= [[!.
  • (?<code>.*?) <- This captures the code piece.
  • (\]\]|(?<=\[\[[^\r\n]*?)\r\n) <- This captures the closing bracket.

The goal is to convert the source text into a program so we can execute it. For each <txt>, we append an sb.Append(txt) where sb is a StringBuilder. For each <code> we directly write the text – it is not wrapped in a sb.Append(). The beginning and ending brackets and anything that starts with a “[[!” are stripped out and not copied over.

In this first step, the program header and footer are also added. In the header, we add some using statements, a class header, and function header. In the footer, we add some useful functions like “Write(...)” and “WriteLine(...)” and finally complete the class with a “}”.

One other item to note is that before we run the RegEx, a “[[]]” is appended at the end. (text + NORM + TAIL). This is because the RegEx is looking TEXT-CODE chunks and this means we must end with a CODE. In this case, it’s just an empty code “[[]]”.

Step 2) Compile the generator program - The program we built in Step 1, is then compiled using the .NET CSharpCodeProvider. GenerateInMemory does not save the file into RAM but rather a temporary folder. TempFiles.KeepFiles = false must be set to ensure these files are cleaned up. Also in this step, we print out any errors.

Step 3) Run the program to collect the output – In the last step, we invoke the mini-program we generated and return its output.

Sample Input/Output

Sample Input

This first example will write Hello World three times:

[[~ for(int i=0; i<3; i++){
Hello World [[ Write(i.ToString()+"! "); }]]

[[! This comment will not be added to the output. ]]

Write() will print any bool, string, char, decimal, double, float, int...
A Quadrillion is 1[[ for(int i=0; i<15; i++) Write("0"); ]]

This will also write bool, string, char, decimal, double, float, int...
Hello at [[=DateTime.Now]]!

[[ for(int i=1; i<4; i++){ ]]
[[="Hello " + i + " World"+ (i>1?"s!":"!") ]]
How are you? "[[=i]]" [[="\r\n"]]
[[ } ]]

The Intermediate Generated Code

The following is the behind-the-scenes temporary generated code that was created from the sample input. This will be executed in the next step to create the final output. The block below is generated code and the formatting is not clean.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
  class T44Class { 
  static StringBuilder sb = new StringBuilder();
  public static string Execute() {
 Write(@"
"); for(int i=0; i<3; i++){
 Write(@"Hello World "); Write(i.ToString()+"! "); } Write(@"

"); Write(@"

Write() will print any bool, string, char, decimal, double, float, int...
    A Quadrillion is 1"); for(int i=0; i<15; i++) Write("0");  Write(@"

This will also write bool, string, char, decimal, double, float, int...
    Hello at "); sb.Append(DateTime.Now); Write(@"!

"); for(int i=1; i<4; i++){  Write(@"
    "); sb.Append("Hello " + i + " World"+ (i>1?"s!":"!") ); Write(@"
    How are you? """); sb.Append(i); Write(@"""  "); sb.Append("\r\n"); Write(@"
"); }  Write(@"
");  return sb.ToString();}
static void Write<T>(T val) { sb.Append(val);}
static void Format(string format, params object[] args) { sb.AppendFormat(format,args);}
static void WriteLine(string val) { sb.AppendLine(val);}
static void WriteLine() { sb.AppendLine();} 
static void Main(string[] args) { Execute(); Console.Write(sb.ToString()); } }

Sample Output

This first example will write Hello World three times:

Write() will print any bool, string, char, decimal, double, float, int...
A Quadrillion is 1000000000000000

This will also write bool, string, char, decimal, double, float, int...
Hello at 1/18/2015 8:48:13 AM!

Hello 1 World!
How are you? "1"

Hello 2 Worlds!
How are you? "2"

Hello 3 Worlds!
How are you? "3"

When Not to Use this Code

  • Security – Since the function compiles and runs commands (like a script), it has the potential to be abused. Be cautious of what or who might call this function and what permission levels the program is running in.
  • Not a replacement for T4 in Visual Studio. T4 is built into Visual Studio so use that. It is also more feature rich, more commonly known, and easier to debug in newer versions of Visual Studio.
  • Avoid using templates if possible. Be careful not to jump in and use text templates. They can be confusing for others and make code complicated. Make sure you need them first. If the structure of the text template is always the same, then just write the code. For example, don’t use template transformation to do Current time: [[=DateTime.Now ]] when "Current time:" + DateTime.Now.ToString() would suffice. Also, the performance is not that great.

Points of Interest

The most enjoyable part of the project was creating the language. The main goal was for it to be simple and easy to read. My first version used “#” for inline code but it was not as clean as I wanted. After some experimentation, the [[...]] style won out. But after toying with [[...]] for a while, I noticed that it wreaked havoc with code completion and syntax highlighting engines. After additional experimentation, I had an idea to use the built in opening/closing comments but with a twist to separate them from normal comments. Eventually, a style like (/*: ... :*/ and //: ...) pervaled. Since comments are ignored by code completion and syntax highlighting engines, the inline code would also be ignored. This worked well except in the instance when there needed to be some kind of filler code for the editor. Here is an example:

int myVar = /*: for(int i=1; i<4; i++) Write(i) :*/; shows as an error in the designer because the codesense sees “int myVar = ;

but modifying it like this fixes the issue....

int myVar = /*: for(int i=1; i<4; i++) Write(i) :*/ /**/1/**/; works because the editor will see “int myVar = 1;

Both of the above would work okay however after using the template function on them. They would expand out to “int myVar = 123;” but the first one would just show an error in the IDE.

Compatibility

  • no DLLs required
  • no using statements needed
  • works in both x64 and x86
  • directly runnable in .NET 3.5, 4.0, 4.0 Client Profile, 4.5, and 4.51
  • Also works in .NET 2.0, 3.0, 3.5 (Client Profile) if "System.Core.dll" and "using System.Linq;" are removed.
  • Tested okay under Visual Studio 2010/2012/2013/2015

Performance

The included sample takes 81ms (i7 2nd Gen, 3.2Ghz, SSD). Release, debug, and release without debugger all had similar performances.

Breakout:

  • 1ms to generate code
  • 76ms for compile
  • 3ms execute program

History

  • December 2014: Started
  • 3rd January, 2015: Initial version
  • 17th January, 2015: Removed linq code (works better with pre .NET 3.5)
  • 19th January, 2015: Added missing "const"
  • 21st January, 2015: A few changes:
    • Function signature changed to:
      C#
      bool success = Expand(string input, out string output)
    • Also changed to [[=code]] instead of [=code]] for better clarity.
    • Added void main() so the intermediate stage can be dropped into VS for debugging

License

This article, along with any associated source code and files, is licensed under The MIT License

Share

About the Author

Ryan S White
Help desk / Support
United States United States
Ryan White is an IT Coordinator, currently living in Pleasanton, California.

He earned his B.S. in Computer Science at California State University East Bay in 2012. Ryan has been writing lines of code since the age of 7 and continues to enjoy programming in his free time.

You can contact Ryan at s u n s e t q u e s t -A-T- h o t m a i l DOT com if you have any questions he can help out with.

Comments and Discussions

 
QuestionEscaping rules? Pin
ShawnVN9-May-21 16:49
MemberShawnVN9-May-21 16:49 
I haven't tried this out yet, but does it apply any default escaping rules to the injected text?

Ime this is the biggest source of things like XSS vulnerabilities.. I've even seen SQL-injection attacks when template-processors are used to formulate SQL statements at runtime.

Probably not a concern for your particular use case of code-gen asm.. but it's a problem worth calling out in the article, for anyone who may want to serve HTML or compose SQL or JSON or etc with a library like this.

Big picture though, I really like the idea of code-gen for templating..
QuestionFormat Pin
Nelek20-Jan-15 7:30
protectorNelek20-Jan-15 7:30 
AnswerRe: Format Pin
Ryan S White29-Jan-15 5:27
professionalRyan S White29-Jan-15 5:27 
GeneralRe: Format Pin
Nelek29-Jan-15 8:15
protectorNelek29-Jan-15 8:15 
GeneralTypo Pin
RenniePet20-Jan-15 3:20
MemberRenniePet20-Jan-15 3:20 
GeneralRe: Typo Pin
Ryan S White20-Jan-15 10:15
professionalRyan S White20-Jan-15 10:15 
QuestionA misprint Pin
morgex19-Jan-15 0:30
Membermorgex19-Jan-15 0:30 
AnswerRe: A misprint Pin
Ryan S White19-Jan-15 8:03
professionalRyan S White19-Jan-15 8:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.