Don't count spaces when counting words.

Pete O'Hanlon

4.86/5 (23 votes)

Mar 17, 2010

CPOL

82274

Over the last couple of days I've seen numerous examples of people posting about how to count words in a sentence. Disturbingly, these postings recommend suggest counting the number of spaces in the sentence and use that as the basis of a word count.You may be asking why this is a problem. Well,...

The total&nbsp;number&nbsp;of&nbsp;words&nbsp;&nbsp;&nbsp;&nbsp;\t&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;this&nbsp;sentence,is&nbsp;10.

As you can see, simply counting spaces isn't going to work. There's the special characters (the \t) to take care of, the multiple spaces, and the words separated by a comma without a space. So, if counting spaces doesn't work, what does? The answer is to use a regular expression, and you are going to love how simple it is. There's a simple regular expression that matches words, and takes care of all the guff demonstrated above; all you need to match a word is use \w+. Here's a quick sample:

Regex regex = new Regex("\\w+",  RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant);
string input = "The total number of words       \t        this sentence is 10.";
MatchCollection match = regex.Matches(input);