Click here to Skip to main content
15,880,956 members
Articles / Programming Languages / XML

Enhanced String Handling III

Rate me:
Please Sign up or sign in to vote.
4.74/5 (17 votes)
4 Jun 2013CPOL7 min read 31.6K   398   80   1
Text document files can be enhanced by using construct and equivalent classes that "know" how to evaluate the constructs.

What is it all about?

Ability to evaluate strings according to predefined programmatic rules that you put together. These rules use regular expressions and are programmed in C#. A configuration file, an XML document, or any other document may contain a changeable construct that through an evaluate process you arrive at your desired resulting document.

Testimonial

A recent example that I worked on required to produce a large number of XML documents to be used as transmission messages, these XML documents were generated at least once a day. The XML documents could have been created from scratch or they could have been created using the method described herein. Using the method described herein, starting with a modified XML template embedded with the constructs that were evaluated for every new XML message production, turned out to be the winning way to go, especially in the face of changing requirements for the XML documents.

Introduction

Potentially a good way to explain the essence of the string enhancement is an example. Consider the appSettings section of the configuration file. If we have a directory that repeats multiple times throughout the configuration, we could encapsulate this directory as {key::BaseDir} as follows:

XML
<add key="BaseDir" value="c:\somedirectory"/>
<add key="TestFile" value="{key::BaseDir}\FileName"/>

Instead of:

XML
<add key="TestFile" value="c:\somedirectory\FileName"/>

If the directory "c:\somedirectory" is used in multiple entries of the appSettings, then using {key::BaseDir} will be a clear win, especially if {key::BaseDir} corresponds to a value that changes from time to time.

The construct {key::BaseDir} is the one to be evaluated, for which we will introduce a class, ProcessKey, that will “know” how to do the evaluation. In this case “key” is literally the string “key” though in general the construct may have any string for key and we will introduce a class, ProcessXxx that will “know” how to perform the evaluation.

The code accompanying the article will show quite a few examples of different keys and their equivalent ProcessXxx classes.

Motivations and goals

  • Main Motivation: Ability to use a simple construct of the form {key::value} where “key” and “value” are place holders. Key is an ID driving the evaluation and value is the evaluation instructions.
  • Case Insensitive: Provide the ability to enter constructs having their keys and values specified in a case insensitive way.
  • Nesting / Sequential: Constructs may be entered nested within other constructs or entered side-by-side to other constructs.
  • Extendible: The system should be able to extend itself to support more constructs as the need arises.

Nomenclature

A construct is assembled as followed:

  • Open Delimiter: In the {key::BaseDir} example it is an open brace “{“.
  • Close Delimiter: In the {key::BaseDir} example it is a close brace “}”.
  • Separator: Separates the various parts of the construct. In the {key::BaseDir} example it is the double colon “::”.
  • Key: The ID of the construct. In the {key::BaseDir} example it is the literal “key”.
  • Value: The value that the process will use to evaluate the construct. In the {key::BaseDir} example it is BaseDir. The “value” part may contain substructures separated with the Separator. For example {Add::2::3}. The "value" is "2::3".

Windows or Web, C# or VB

The code example provided with the article is a Windows console application written in C#, but nothing prevents you from using the concept, as is, in a Windows Forms application, WPF application, in an ASP web environment with or without WPF, a Windows service, or using the EnhancedStringEvaluate DLL as is with VB or any other language client.

I have used the EnhancedStringEvaluate DLL in production code, within multiple Windows Forms applications, Windows Console applications, and multiple Windows service applications.

If all you want: is to move on down the road

If understanding how things are done is secondary to solving the problem at hand, then:

  • Include the EnhancedStringEvaluate csproj as part of your solution and include it in your client’s References. A less attractive alternative is to include the EnhancedStringEvaluate DLL without the code as part of your project’s References.
  • Image 1

  • Modify or add to the repertoire of ProcessXxx classes, encircled in red below. You may elect to create a separate csproj that will house the set of ProcessXxx classes.
  • Image 2

Hereafter, you are on your way, see the EvaluateSampleTest.EnhancedStringEvaluateTest set of samples. In effect, you need to:

  1. Define the context that is needed to be passed on to the EnhancedStringEval class constructor.
  2. New up the EnhancedStringEval passing on the context just created.
  3. Use the EvaluateString(..) method of the EnhancedStringEval to evaluate a string.
C#
var context = new List<IProcessEvaluate> { new ProcessXxx() };
var eval = new EnhancedStringEval(context);
string evaluatedStr = eval.EvaluateString("{Key::Value}");

To make it read the above example of {key::BaseDir} the above client code will become:

C#
var context = new List<IProcessEvaluate> { new ProcessKey() };
var eval = new EnhancedStringEval(context);
string directory = eval.EvaluateString("{Key::BaseDir}");

Creating a new ProcessXxx

Assuming that you have a (key, value) construct in mind then the first thing to do is define the regular expression needed to identify this construct. So, for example, if we need to define a construct that will yield a mathematical addition of two numbers then our construct will look like: “{Add::num1::num2}”. The intention is that “{Add::2::3}” will yield “5”. We expect our document to contain such a construct:

Document to process using our method (document contains one line):

The sum of 2 and 3 is {Add::2::3}

The construct could be invoked as follows:

C#
IList<IProcessEvaluate> context = new List<IProcessEvaluate> { new ProcessAdd() };
var eval = new EnhancedStringEval(context);
Open_the_document()
foreach (string line = read_next_line_until_EOF())
{
    string res = eval.EvaluateString(line);
    // will yield "The sum of 2 and 3 is 5"
}

A quick scan through the repertoire of ProcessXxx, provided with the code accompanying the article, we see that none of them provide the ProcessAdd class. Which means we need to write one.

The new ProcessAdd class needs:

  • Inherits from ProcessEvaluateBase abstract class.
  • Provide a class constructor that will initialize the regular expression identifying the {Add::num1::num2} construct.
  • Override the property RePattern.
  • Override the method PatternReplace. PatternReplace will evaluate the pattern.

The class name, by convention is prefixed with “Process”. For the sake of simplicity we will allow num1 and num2 to be whole numbers only.

The desired regular expression matching the above add-construct is:

{Add::(?<num1>\d+)::(?<num2>\d+)}

The delimiters and separators reaching ProcessXxx need to use equivalents, OpenDelimEquivalent, CloseDelimEquivalent, and SeparatorEquivalent, as defined in the DelimitersAndSeparator class, making the regular expression like so:

C#
RePattern = new Regex(string.Format(@"{0}Add{2}(?<num1>\d+){2}(?<num2>\d+){1}",
        delim. OpenDelimEquivalent, delim.CloseDelimEquivalent, 
        delim.SeparatorEquivalent), RegexOptions.IgnoreCase);

The precise definition of the equivalents is not important, the fact that you need to use them is important.

The last part, PatternReplace, is where we evaluate the construct and produce, as result, the evaluated string. In our case it is a simple matter of adding two numbers. We are confident that num1 and num2 are a string of digits because the regular expression would not have identified the construct as valid otherwise.

Note: The PatternReplace method has two parameters: the first, Match m, and second, EnhancedStringEventArgs ea. The first, Match m, is an expected parameter, it carries the matched expression. If the line is: “The sum of 2 and 3 is {Add::2::3}” then only “{Add::2::3}” matches the regular expression, RePattern. The second parameter, EnhancedStringEventArgs ea, is provided in case an exception needs to be thrown, in which case you should throw an EnhancedStringException. One constructor overload of the EnhancedStringException class accepts three parameters: string key, EnhancedStrPairElement elem, string message. The key is defined in the construct, in the “{Add::num1::num2}” example, it is “Add”. For the second parameter, elem, pass: ea.EhancedPairElem. For the third parameter pass your message. So an example of an exception, say in case one of the numbers passed in is too large, is:

C#
throw new EnhancedStringException("Add", ea.EhancedPairElem, "Number too big");

The class as a whole is:

C#
using EnhancedStringEvaluate;
using System;
using System.Text.RegularExpressions;

namespace TestEvaluation.ProcessEvaluate
{
    /// <summary>
    /// Purpose:
    /// Add two integral numbers: {Add::number1::number2}
    /// </summary>
    public sealed class ProcessAdd : ProcessEvaluateBase
    {
        public ProcessAdd() : this(DelimitersAndSeparator.DefaultDelimitersAndSeparator) { }
        public ProcessAdd(IDelimitersAndSeparator delim) : base(delim)
        {
            if (delim == null) throw new ArgumentException("Delim may not be null", "delim");
            const RegexOptions reo = RegexOptions.Singleline | RegexOptions.IgnoreCase 
                | RegexOptions.Compiled;
            string pattern = string.Format(@"({0})Add{2}(?<Num1>\d+){2}(?<Num2>\d+)({1})",
            delim.OpenDelimEquivalent, delim.CloseDelimEquivalent, delim.SeparatorEquivalent);
            RePattern = new Regex(pattern, reo);
        }

        protected override Regex RePattern { get; set; }

        protected override string PatternReplace(Match m, EnhancedStringEventArgs ea)
        {
            string sNum1 = m.Groups["Num1"].Value;
            string sNum2 = m.Groups["Num2"].Value;
            int num1 = int.Parse(sNum1, NumberStyles.Any);
            int num2 = int.Parse(sNum2, NumberStyles.Any);
            int res = num1 + num2;
            return res.ToString();
        }
    }
}

At this point we are done. Hereon I will concentrate on explaining how the EnhancedStringEvaluate DLL does what it does.

Architecture of classes

Image 3

Review of EnhancedStringEval

The EnhancedStringEval class is the engine that drives its evaluation, by delegating the evaluation to a string-replace method, PatternReplace(..) in the ProcessXxx classes. The process follows the strategy pattern.

You do not need to know the strategy pattern to understand this article. If you do know of the strategy pattern then you will find that knowledge helpful in understanding the code in the EnhancedStringEvaluate class. Lack of strategy pattern knowledge will not hurt.

Let’s discuss the code from a high level point of view anchoring our thinking with the following code:

C#
IList<IProcessEvaluate> context = new List<IProcessEvaluate> { 
    new ProcessXxx1(..), 
    new ProcessXxx2(..), 
    . . . 
}; 
var eval = new EnhancedStringEval(context); 
. . . 
string res = eval.EvaluateString(line);

The above code is not part of the “Architecture of classes” depiction above, it is part of your client code that calls to evaluate a line. The following discussion will use the terms, context, line, and other terms referring to the above code.

As you can see, in the above code, the end-point of the EnhancedStringEval class is the method eval.EvaluateString, which delegates the heavy lifting to the EvaluatedStringPure method (see the code for EvaluateString). EvaluateStringPure calls upon the ProcessXxx classes in the context to see if the line contains a construct that can be evaluated.

If no construct is available for evaluation then we move on and leave the input line as it was. If, on the other hand, the input line does contain a construct that can be evaluated, then the appropriate ProcessXxx will do its duty.

In closing

I thank you for reading the article and hope that you find it useful.

Tools used

VS 2012 was used for the coding example using .NET 4.5 on a Windows 8 platform. "Visual Paradigm for UML Community Edition" is the tool used for diagramming the UML diagrams.

Enjoy!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
avifarah@gmail.com

Comments and Discussions

 
GeneralMy vote of 5 Pin
Gun Gun Febrianza4-Jun-13 8:51
Gun Gun Febrianza4-Jun-13 8:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.