Help with regular expressions in C#

Question

4.00/5 (1 vote)

See more:

Hi guys!
I have text in some string ...

<br />
string m_TextToValidate;<br />

please help me with a regular expression that searches for all words in text even with one symbol
excluding all linebreaks and whitespaces....
Example,

"Lazy fox jumps over some                   stone      and I want to cheat that         fox"

<br />
  Lazy<br />
  fox<br />
  jumps<br />
  over <br />
  some<br />
  stone<br />
  and <br />
  I <br />
  want<br />
  to<br />
  cheat<br />
  that<br />
  fox<br />

13 words here ...

Posted 24-Mar-11 8:33am

Nick Reshetinsky

Updated 24-Mar-11 8:42am

v2

Add a Solution

5 solutions

Solution 4

Just in case you consider all non-whitespace characters valid word characters, here is a regular expression that will handle that:

((?!\s).)+

Posted 24-Mar-11 9:40am

AspDotNetDev

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Accepted Answer · 2011-03-24T09:30:00

SA has it pretty spot on, with a minor tweak - replace the '*' with a '+':

C#

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Mar 24, 2011, 07:26:23 PM
///  Using Expresso Version: 3.0.3634, http://www.ultrapico.com
///
///  A description of the regular expression:
///
///  \b\w+\b
///      First or last character in a word
///      Alphanumeric, one or more repetitions
///      First or last character in a word
///
///
/// </summary>
public static Regex regex = new Regex(
      "\\b\\w+\\b",
    RegexOptions.Multiline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace =
      "<Hello>";


//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);

//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);

//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);

//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);

//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);

//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();

//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

Get yourself a copy of Expresso: it generated the code for me, and also explains and helps design regexes.

Wendelius · Accepted Answer · 2011-03-24T09:36:00

Solution 3

Maybe on the wrong track but could you use simply

\w+

Posted 24-Mar-11 9:36am

Wendelius

Comments

AspDotNetDev 24-Mar-11 15:49pm

5. Not quite sure why SAK included word boundaries in his answer.

Wendelius 24-Mar-11 15:52pm

Thanks. I have to admit that I'm really poor with regex's so I'm definitely the wrong person to ask :)

Sergey Alexandrovich Kryukov 24-Mar-11 15:59pm

You're right, apparently. Thank you for the note. My 5 for your Answer.
--SA

Wendelius 24-Mar-11 16:20pm

Thanks :)

Toli Cuturicu · Accepted Answer · 2011-03-24T10:14:00

Solution 5

The above answers are, of course, correct.
But... Regexes may be slow to compile / run and you don't know to use them!

So, my answer would be... Don't use Regexes, then:

static readonly char[] WhiteSpace = new char[] {' ', '\t', '\r', '\n', /* Add more if you want to */ };
string[] words = initialString.Split(WhiteSpace, StringSplitOptions.RemoveEmptyEntries)

Posted 24-Mar-11 10:14am

Toli Cuturicu

Comments

Sergey Alexandrovich Kryukov 24-Mar-11 16:17pm

Absolutely acceptable answer. My 5.
--SA

Wendelius 24-Mar-11 16:20pm

Good answer, 5.

AspDotNetDev 24-Mar-11 16:20pm

5. Good observation.

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-03-24T08:42:00

Solution 1

Use this Regex pattern:

\b\w+\b

Use System.Text.RegularExpressions.Matches(string inputString) to get all matches. That's it.

Next time, refer to regular expression reference, such as this: http://www.regular-expressions.info/reference.html[^].

[EDIT] The pattern fixed, thanks to Griff.

—SA

Posted 24-Mar-11 8:42am

Sergey Alexandrovich Kryukov

Updated 24-Mar-11 9:53am

v2

Comments

Nick Reshetinsky 24-Mar-11 14:53pm

no it's not an option it still includes whitespaces

Sergey Alexandrovich Kryukov 24-Mar-11 15:01pm

No. You say "excluding" white spaces.
Oh, I see! You don't know what's \b. It is NOT a white space! This is not a character at all!
This pattern *excludes* delimiters and white spaces.
Learn Regex or at least test before arguing. You simply did not get it.
--SA

OriginalGriff 24-Mar-11 15:30pm

No, you didn't. You tried it once and decided it didn't work because it gave you NULL results as well as the info you wanted.

You didn't bother to look at it, and think about what SA had given you: the solution, but for a minor tweak.

Sergey Alexandrovich Kryukov 24-Mar-11 16:03pm

Thanks a lot, Griff. and just "\w+" works correctly. I just put the first which works (I have some engine in my own text editor, but it is not perfectly standard and also specialized; it works correctly in most cases.)
--SA

OriginalGriff 24-Mar-11 15:32pm

Hi SA! Ungrateful isn't he? Fortunately it's a simple patch to your regex (I put it in a separate answer so I could use the formatting and syntax colouriser) - feel free to include it in yours and delete my answer if you want!

You get my 5, anyway!

Sergey Alexandrovich Kryukov 24-Mar-11 15:54pm

Thanks a lot, Griff, I applied your fix.
I used different engine for my test, my fault.
--SA

Nick Reshetinsky 24-Mar-11 15:36pm

sorry..... :(

Wendelius 24-Mar-11 15:58pm

Good answer, 5.

Sergey Alexandrovich Kryukov 24-Mar-11 16:16pm

Thank you, Mika.
--SA

Toli Cuturicu 24-Mar-11 16:16pm

Have my 5, but... maybe not Regex at all... See my answer.

Sergey Alexandrovich Kryukov 24-Mar-11 16:18pm

Thank you, Toli.
Of course. Your solution is quite acceptable. I already voted 5.
--SA

Help with regular expressions in C#

5 solutions

Solution 2

Solution 3

Solution 4

Solution 5

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us