Click here to Skip to main content
15,886,578 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
I'm building a error parser for a middle tier utility. The utility accepts some XML, manipulates it depending on content and then passes it on to another service. The new error parcer is going to be doing schema and data validation against the incoming XML. Many of the errors we receive are missing elements. Error messages like:

Line: 1, Position 2: "Could not find schema information for the element 'HearingDocExtract'."
Line: 1, Position 3288: "The 'DocumentCopy' element is not declared."
Line: 1, Position 3301: "Could not find schema information for the attribute 'DocumentID'."

Depending on which error and which element is causing the problem, different individuals need to be informed. I was wondering if there was some way to access the text that the wild card matched in each error and just grab the text from there.

What I have tried:

Currently I have a regex check to determine which error has occurred and then substring out the element or attribute name. This works but I have to store the different parts of all the errors to figure out how much I have to trim off from each side of each string which is not ideal. I also cut off the line and position part of each error before I perform the regex check so there is only one wild card per check.
Posted
Updated 9-May-18 6:43am

Here is a nice extension class that will make regex searches a lot simpler: I don't like Regex...[^]
You can search using a wildcard like * and the result will contain the found string.

I modified it a little so * also includes spaces in the result and users don't need to use %:
C#
static Regex GetRegex(string searchPattern)
        {
            return new Regex(searchPattern
                    .Replace("\\", "\\\\")
                    .Replace(".", "\\.")
                    .Replace("{", "\\{")
                    .Replace("}", "\\}")
                    .Replace("[", "\\[")
                    .Replace("]", "\\]")
                    .Replace("+", "\\+")
                    .Replace("$", "\\$")
                    .Replace(" ", "\\s")
                    .Replace("#", "[0-9]")
                    .Replace("?", ".")
                    //.Replace("*", "\\w*")     // word only, no whitespace
                    .Replace("*", ".*")
                    .Replace("%", ".*")
                    , RegexOptions.IgnoreCase);
        }
 
Share this answer
 
v2
Comments
Malindor 9-May-18 12:56pm    
Thanks for the function and the link to the article! I wish I had this earlier.
BillWoodruff 10-May-18 17:46pm    
+5 Appreciated ... I dislike RegEx as a way of avoiding feeling inferior to other programmers here who use it so masterfully :) Of course, I assume you don't like it for a much more logical reason !
RickZeeland 11-May-18 10:12am    
Your appreciation is appreciated :) I don't dislike regex, but the problem for me is that I always forget the syntax and have to invest precious time to learn it again ...
If you mean HearingDocExtract, DocumentCopy, and DocumentID Then this will do it:
(['])(?:(?=(\\?))\2.)*?\1

C#
using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Wed, May 9, 2018, 05:43:42 PM
///  Using Expresso Version: 3.0.4750, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  [1]: A numbered capture group. [[']]
///      Any character in this class: [']
///  Match expression but don't capture it. [(?=(\\?))\2.], any number of repetitions, as few as possible
///      (?=(\\?))\2.
///          Match a suffix but exclude it from the capture. [(\\?)]
///              [2]: A numbered capture group. [\\?]
///                  Literal \, zero or one repetitions
///          Backreference to capture number: 2
///          Any character
///  Backreference to capture number: 1
///  
///
/// </summary>
public static Regex regex = new Regex(
      "(['])(?:(?=(\\\\?))\\2.)*?\\1",
    RegexOptions.Multiline
    | RegexOptions.Singleline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );
// Capture all Matches in the InputText
MatchCollection ms = regex.Matches(InputText);

If you want to use Regexes, then get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
 
Share this answer
 
Comments
Malindor 9-May-18 12:55pm    
Didn't even think of hitting on the single quotes. I've been looking at this for too long I guess. It may not work in all instances but should work for most of them. Thanks for the heads up on the Expresso as well!!
OriginalGriff 9-May-18 13:56pm    
You're welcome!
BillWoodruff 10-May-18 17:47pm    
+5 and may your cat bite you !
OriginalGriff 10-May-18 17:59pm    
Don't give him any ideas! He already slashes at my feet on a regular basis.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900