Click here to Skip to main content
15,885,767 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hi guys!
I have text in some string ...

<br />
string m_TextToValidate;<br />


please help me with a regular expression that searches for all words in text even with one symbol
excluding all linebreaks and whitespaces....
Example,
"Lazy fox jumps over some                   stone      and I want to cheat that         fox"


<br />
  Lazy<br />
  fox<br />
  jumps<br />
  over <br />
  some<br />
  stone<br />
  and <br />
  I <br />
  want<br />
  to<br />
  cheat<br />
  that<br />
  fox<br />


13 words here ...
Posted
Updated 24-Mar-11 8:42am
v2

SA has it pretty spot on, with a minor tweak - replace the '*' with a '+':
C#
//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Mar 24, 2011, 07:26:23 PM
///  Using Expresso Version: 3.0.3634, http://www.ultrapico.com
///
///  A description of the regular expression:
///
///  \b\w+\b
///      First or last character in a word
///      Alphanumeric, one or more repetitions
///      First or last character in a word
///
///
/// </summary>
public static Regex regex = new Regex(
      "\\b\\w+\\b",
    RegexOptions.Multiline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace =
      "<Hello>";


//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);

//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);

//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);

//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);

//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);

//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();

//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();


Get yourself a copy of Expresso: it generated the code for me, and also explains and helps design regexes.
 
Share this answer
 
Comments
Nick Reshetinsky 24-Mar-11 15:34pm    
THANKS A LOT!!!!!!
Sergey Alexandrovich Kryukov 24-Mar-11 15:52pm    
Why would I ever remove it? Great answer, my 5.
--SA
OriginalGriff 24-Mar-11 16:08pm    
You did all the hard work!
Sergey Alexandrovich Kryukov 24-Mar-11 16:19pm    
No, you did. Thank you anyway.
--SA
Maybe on the wrong track but could you use simply
\w+
 
Share this answer
 
Comments
AspDotNetDev 24-Mar-11 15:49pm    
5. Not quite sure why SAK included word boundaries in his answer.
Wendelius 24-Mar-11 15:52pm    
Thanks. I have to admit that I'm really poor with regex's so I'm definitely the wrong person to ask :)
Sergey Alexandrovich Kryukov 24-Mar-11 15:59pm    
You're right, apparently. Thank you for the note. My 5 for your Answer.
--SA
Wendelius 24-Mar-11 16:20pm    
Thanks :)
Just in case you consider all non-whitespace characters valid word characters, here is a regular expression that will handle that:
((?!\s).)+
 
Share this answer
 
The above answers are, of course, correct.
But... Regexes may be slow to compile / run and you don't know to use them!

So, my answer would be... Don't use Regexes, then:
static readonly char[] WhiteSpace = new char[] {' ', '\t', '\r', '\n', /* Add more if you want to */ };
string[] words = initialString.Split(WhiteSpace, StringSplitOptions.RemoveEmptyEntries)
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 24-Mar-11 16:17pm    
Absolutely acceptable answer. My 5.
--SA
Wendelius 24-Mar-11 16:20pm    
Good answer, 5.
AspDotNetDev 24-Mar-11 16:20pm    
5. Good observation.
Use this Regex pattern:
\b\w+\b

Use System.Text.RegularExpressions.Matches(string inputString) to get all matches. That's it.

Next time, refer to regular expression reference, such as this: http://www.regular-expressions.info/reference.html[^].

[EDIT] The pattern fixed, thanks to Griff.

—SA
 
Share this answer
 
v2
Comments
Nick Reshetinsky 24-Mar-11 14:53pm    
no it's not an option it still includes whitespaces
Sergey Alexandrovich Kryukov 24-Mar-11 15:01pm    
No. You say "excluding" white spaces.
Oh, I see! You don't know what's \b. It is NOT a white space! This is not a character at all!
This pattern *excludes* delimiters and white spaces.
Learn Regex or at least test before arguing. You simply did not get it.
--SA
OriginalGriff 24-Mar-11 15:30pm    
No, you didn't. You tried it once and decided it didn't work because it gave you NULL results as well as the info you wanted.

You didn't bother to look at it, and think about what SA had given you: the solution, but for a minor tweak.
Sergey Alexandrovich Kryukov 24-Mar-11 16:03pm    
Thanks a lot, Griff. and just "\w+" works correctly. I just put the first which works (I have some engine in my own text editor, but it is not perfectly standard and also specialized; it works correctly in most cases.)
--SA
OriginalGriff 24-Mar-11 15:32pm    
Hi SA! Ungrateful isn't he? Fortunately it's a simple patch to your regex (I put it in a separate answer so I could use the formatting and syntax colouriser) - feel free to include it in yours and delete my answer if you want!

You get my 5, anyway!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900