Using C# regex expressions, is there a way to get text that matched the wild card

Question

5.00/5 (1 vote)

See more:

I'm building a error parser for a middle tier utility. The utility accepts some XML, manipulates it depending on content and then passes it on to another service. The new error parcer is going to be doing schema and data validation against the incoming XML. Many of the errors we receive are missing elements. Error messages like:

Line: 1, Position 2: "Could not find schema information for the element 'HearingDocExtract'."
Line: 1, Position 3288: "The 'DocumentCopy' element is not declared."
Line: 1, Position 3301: "Could not find schema information for the attribute 'DocumentID'."

Depending on which error and which element is causing the problem, different individuals need to be informed. I was wondering if there was some way to access the text that the wild card matched in each error and just grab the text from there.

What I have tried:

Currently I have a regex check to determine which error has occurred and then substring out the element or attribute name. This works but I have to store the different parts of all the errors to figure out how much I have to trim off from each side of each string which is not ideal. I also cut off the line and position part of each error before I perform the regex check so there is only one wild card per check.

Posted 9-May-18 6:32am

Malindor

Updated 9-May-18 6:43am

Add a Solution

2 solutions

Solution 1

Here is a nice extension class that will make regex searches a lot simpler: I don't like Regex...[^]
You can search using a wildcard like * and the result will contain the found string.

I modified it a little so * also includes spaces in the result and users don't need to use %:

C#

static Regex GetRegex(string searchPattern)
        {
            return new Regex(searchPattern
                    .Replace("\\", "\\\\")
                    .Replace(".", "\\.")
                    .Replace("{", "\\{")
                    .Replace("}", "\\}")
                    .Replace("[", "\\[")
                    .Replace("]", "\\]")
                    .Replace("+", "\\+")
                    .Replace("$", "\\$")
                    .Replace(" ", "\\s")
                    .Replace("#", "[0-9]")
                    .Replace("?", ".")
                    //.Replace("*", "\\w*")     // word only, no whitespace
                    .Replace("*", ".*")
                    .Replace("%", ".*")
                    , RegexOptions.IgnoreCase);
        }

Posted 9-May-18 6:45am

RickZeeland

Updated 9-May-18 6:48am

v2

Comments

Malindor 9-May-18 12:56pm

Thanks for the function and the link to the article! I wish I had this earlier.

BillWoodruff 10-May-18 17:46pm

+5 Appreciated ... I dislike RegEx as a way of avoiding feeling inferior to other programmers here who use it so masterfully :) Of course, I assume you don't like it for a much more logical reason !

RickZeeland 11-May-18 10:12am

Your appreciation is appreciated :) I don't dislike regex, but the problem for me is that I always forget the syntax and have to invest precious time to learn it again ...

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Accepted Answer · 2018-05-09T06:45:00

If you mean HearingDocExtract, DocumentCopy, and DocumentID Then this will do it:

(['])(?:(?=(\\?))\2.)*?\1

C#

using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Wed, May 9, 2018, 05:43:42 PM
///  Using Expresso Version: 3.0.4750, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  [1]: A numbered capture group. [[']]
///      Any character in this class: [']
///  Match expression but don't capture it. [(?=(\\?))\2.], any number of repetitions, as few as possible
///      (?=(\\?))\2.
///          Match a suffix but exclude it from the capture. [(\\?)]
///              [2]: A numbered capture group. [\\?]
///                  Literal \, zero or one repetitions
///          Backreference to capture number: 2
///          Any character
///  Backreference to capture number: 1
///  
///
/// </summary>
public static Regex regex = new Regex(
      "(['])(?:(?=(\\\\?))\\2.)*?\\1",
    RegexOptions.Multiline
    | RegexOptions.Singleline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );
// Capture all Matches in the InputText
MatchCollection ms = regex.Matches(InputText);

If you want to use Regexes, then get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.