Click here to Skip to main content
15,884,099 members
Please Sign up or sign in to vote.
3.22/5 (2 votes)
My boss is trying to split a comma-delimited string with Regex. He's looking for a comma followed by a blank space, or return a whole string surrounded by single or double quotes while ignoring any commas between the double or single quotes.

Here’s the regex:
C#
string sToken = @"(?:,\s+)|(['""].+['""])(?:,\s+)";

Here’s a sample string:
C#
var s = "1.3#, 2.99, 3\t, 4#2/2/1019#, 5, asd,, 'Howdy, Howdy, Howdy', a;sdlkf";

Results:
1.3#
2.99
3\t
4#2/2/1019#
5
asd,

'Howdy, Howdy, Howdy'
a;sdlkf

The blank line between "asd," and "'Howdy, Howdy, Howdy" is the issue. I believe I understand why it's showing up, but I don't know what regex magic I need to do to prevent it. I believe it's showing up because the regex processor finds the ", " after "asd," and splits "asd," out. It then finds another match (the "'Howdy, Howdy, Howdy'") and splits out everything between the ", " and the "'Howdy, Howdy, Howdy'" (and empty string). Note that the two commas after "asd" are not the problem. Removing one of them provides the same results except that "asd," becomes "asd" (as expected). Putting a space between the commas gives us "asd" followed by two blank lines instead of one.

I'm a rank amateur in the world of regular expressions, but my boss came to me because he had been told that I might know something about them (the squealer who told him this has been suitably punished). :)

Anyway, I would appreciate any assistance in resolving this.

To clarify, what we're looking for as output would be the following:
Desired Results:
1.3#
2.99
3\t
4#2/2/1019#
5
asd,
'Howdy, Howdy, Howdy'
a;sdlkf


The "'Howdy, Howdy, Howdy'" would be a single entry, without the blank line before it. If, however, our sample looked like this (note the space between the two commas after "asd"):
C#
var s = "1.3#, 2.99, 3\t, 4#2/2/1019#, 5, asd, , 'Howdy, Howdy, Howdy', a;sdlkf";

Desired Results:
1.3#
2.99
3\t
4#2/2/1019#
5
asd

'Howdy, Howdy, Howdy'
a;sdlkf

(note the blank line representing the empty value between the two commas, and the lack of a comma at the end of "asd")
Posted
Updated 17-Dec-17 20:54pm
v4
Comments
Richard C Bishop 10-May-13 11:30am    
Do you have to use a REG EX?
Marc A. Brown 10-May-13 11:44am    
Well, it's not my project, so I don't know whether there's a requirement to use regex or not. If you've got an alternative suggestion, please feel free to offer it up. That said, even if the boss chooses to go another route, I hope someone can point out a regex solution to improve my understanding. :)
RaisKazi 10-May-13 14:18pm    
Question description seems to be little incomplete to me. Do you want it as 'Howdy, Howdy, Howdy' or
'Howdy
Howdy
Howdy' ? Also do u want is as 3\t or only 3?

Marc A. Brown 10-May-13 14:46pm    
Thanks for responding. You're correct -- I should have provided the desired results. I have updated the question.
RaisKazi 10-May-13 14:47pm    
Deleted my solution, as I couldn't understood it before, let me see if I can come up with better solution. :)

Try linq with regex example
C#
string s = @"1.3#, 2.99, 3\t, 4#2/2/1019#, 5, asd,, 'Howdy, Howdy, Howdy', a;sdlkf";
string[] myValues = Regex.Split(s, @"(?:,\s+)|(['""].+['""])(?:,\s+)").Where(s2 => !string.IsNullOrEmpty(s2)).ToArray();
foreach (string s1 in myValues)
    MessageBox.Show(s1);
 
Share this answer
 
Comments
Marc A. Brown 10-May-13 13:14pm    
A good answer, but it would eliminate empty values *too* aggressively. For instance, if there was a space between the two commas following "asd", the regex as it currently stands would give us a blank line that we would want to keep, in addition to the one that's not supposed to be there. I love the suggestion though. Thanks!
I found a similar question on Stack Overflow[^] and used its answer on your problem.
The idea is to split only on comma's that have an even number of or no single quotes after it.

Using this expression I got the result without the additional blank line you are looking for.
,\s+(?=(?:(?:[^']*'){2})*[^']*$)

I used Expresso[^] to verify that all the required comma's are selected in the string.
 
Share this answer
 
Comments
Marc A. Brown 10-May-13 15:16pm    
Brilliant! Thanks so much!
Beniamin Iorga 27-Dec-22 3:46am    
Absolutely brilliant, this is exactly what I was looking for

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900