Click here to Skip to main content
15,885,141 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
C#
string text = "animal: leather, bacon, eagle. vegetable: timber, potato. mineral: granite, gem stones, waterfall.";
MatchCollection matches = Regex.Matches(text, @"(?'1'(animal:)|(mineral:)|(vegetable:).*?)", RegexOptions.ExplicitCapture);


I was expecting this code to return 3 matches :

"animal: leather, bacon, eagle. ",
"vegetable: timber, potato. ", and
"mineral: granite, gem stones, waterfall."

Instead the three captures returned are the keywords only. Why is it not capturing the ".*?" following those keywords? If I make the following text greedy, I only get the one match returned, which is the full string (as I would expect).

Cheers,

Mick
Posted
Comments
Midi_Mick 4-Jan-16 7:37am    
Something interesting that could do with explanation, if you know: I tried putting a non-capturing look ahead at the end (to tell the non-greedy capture where to stop), as below:

@"(?'1'(animal:)|(mineral:)|(vegetable:).*?(?=(animal:)|(mineral:)|(vegetable:)|$))"

I get three matches once again, with only the second one being as expected. The first and the last match are just the keywords ("animal:" and "mineral:", while the 2nd is the expected "vegetable: timber, potato. ". Scratching my head here.

Got it. I needed to add additional grouping around the keyword selection (as well as having the look-ahead as described in the comments of the problem). The regex that gives the the expected answer is:

@"(?'1'((animal:)|(mineral:)|(vegetable:)).*?(?=(animal:)|(mineral:)|(vegetable:)|$))

The clue was that the "vegetable:" was giving the correct result - it was doing
(animal:)|(mineral:)|(vegetable:.*?)

rather than
((animal:)|(mineral:)|(vegetable:)).*?
 
Share this answer
 
v4
Comments
Midi_Mick 4-Jan-16 8:20am    
Weird: I cant get the solution to put the asterix in the".*?" - take my word for it, its there (not just .?).
BillWoodruff 4-Jan-16 10:18am    
If you can't fix this, consider posting this as a bug on the Site Suggs&Buggs forum.
Midi_Mick 4-Jan-16 11:10am    
Will do
Try if this works, Just added space before *?

C#
Regex.Matches(text, @"(?'1'(animal:)|(mineral:)|(vegetable:). *?)"
 
Share this answer
 
Comments
Midi_Mick 4-Jan-16 7:13am    
Nope - no good. Wouldn't have expected it to work, but as I am fairly inexperienced with regular expressions, it was worth a shot. With a space, the ". *?" would be a non-greedy match for any character followed by 0 or more spaces.
fyi: how you could do this using Linq to produce a Dictionary<string, List<string>>:
C#
char[] splitchar1 = new[] {'.'};
char[] splitchar2 = new[] {':'};
char[] splitchar3 = new[] {','};

string text = "animal: leather, bacon, eagle. vegetable: timber, potato. mineral: granite, gem stones, waterfall.";

private void SomeButton_Click(object sender, EventArgs e)
{
    Dictionary<string, List<string>> test = ParseText(text);
}

private Dictionary<string,>> ParseText(string text)
{
    return text.Split(splitchar1, StringSplitOptions.RemoveEmptyEntries)
            .Select(x => x.Split(splitchar2, StringSplitOptions.RemoveEmptyEntries))
                .ToDictionary(
                    key => key[0], 
                    val => val[1].Split(splitchar3,StringSplitOptions.RemoveEmptyEntries).ToList());
}
Some of us, are, you might say, RegEx "challenged." But, I do think RegEx is a very good thing, and wish that I once had a reason (back when) to have studied it in depth. While it's my opinion that it's better, in early development, to break-code-out into fuller form than RegEx, and then, in later development, optimize with RegEx in places you identify it can be much more performant, many people I respect do not share that opinion.

You could make a valid argument, I think, that, for someone who doesn't know Linq, that the Linq shown here is as "opaque" as RegEx is to someone who doesn't know RegEx :)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900