Click here to Skip to main content
16,021,172 members
Articles / Programming Languages / C#
Tip/Trick

Counting Lines in a String

Rate me:
Please Sign up or sign in to vote.
4.87/5 (30 votes)
18 Jun 2012CPOL3 min read 175.3K   273   34   31
It seems like an obvious requirement, but the .NET framework will not count occurrences of a character in a string. It's easy to do, but which way is the quickest?

Introduction

I needed to know, so I wrote a quick, brute-force-and-ignorance (henceforth known as "BF+I") method to do it:

C#
static long LinesCount(string s)
    {
    long count = 0;
    int position = 0;
    while ((position = s.IndexOf('\n', position)) != -1)
        {
        count++;
        position++;         // Skip this occurrence!
        }
    return count;
    }

A quick test, and it works fine.

But then, I thought - isn't that a bit...well...rubbish? Isn't there another way to do it?

And sure, there is: I can think of two off the top of my head: Regex and Linq. So, I knocked up more methods.
And then I got a lot of feedback suggesting other ways, so I have re-worked this tip to reflect those.
A quick-n-dirty test rig to check them:

C#
            string s = File.ReadAllText(@"D:\Temp\MyLargeTextFile.txt");
            long index = LinesCountIndexOf(s);
            long regex = LinesCountRegex(s);
            long linq= LinesCountLinq(s);
            Console.WriteLine("{0}:{1}:{2}", index, regex, linq);
            Stopwatch s0 = new Stopwatch();
... repeated, one per test
            Stopwatch sn = new Stopwatch();
            s0.Start();
            for (int i = 0; i < 100; i++)
                {
                index = LinesCountIndexOf(s);
                }
            s0.Stop();
            sb.AppendFormat("IndexOf     : {0,06}\n", s0.ElapsedMilliseconds);
            tbResults.Lines = sb.ToString().Split('\n');
            sl.Start();
... Identical blocks, different method names.

The input file is 1.64MB, and contains around 23K lines. Please note that this file is read once for the entire test run, and is just used to give a significant amount of data for the tests to run against. A small amount of data would take us below the timer resolution of 1 millisecond and distort the result. This is not a test of how fast I can count lines in a file!

The methods tried so far:

C#
static long LinesCountIndexOf(string s)
    {
    long count = 0;
    int position = 0;
    while ((position = s.IndexOf('\n', position)) != -1)
        {
        count++;
        position++;         // Skip this occurrence!
        }
    return count;
    }
static Regex r = new Regex("\n", RegexOptions.Multiline);
static long LinesCountRegex(string s)
    {
    MatchCollection mc = r.Matches(s);
    return mc.Count;
    }
static long LinesCountLinq(string s)
    {
    return (from ch in s
            where ch == '\n'
            select ch).Count();
    }
static long LinesCountSplit(string s)
    {
    return (s.Split(new char[] { '\n' })).Length;
    }

static long LinesCount2(string s)
    {
    long count = 0;
    int posMax = s.Length;

    for (int position = 0; position < posMax; )
        if (s[position++] == '\n')
            count++;

    return count;
    }

static long LinesCount3(string s)
    {
    long count = 0;
    int posMax = s.Length;
    char[] a = s.ToCharArray();

    for (int position = 0; position < posMax; )
        if (a[position++] == '\n')
            count++;

    return count;
    }

static long LinesCount4(string s)
    {
    long count = 0;
    int position = -1;
    while ((position = s.IndexOf('\n', position + 1)) != -1)
        {
        count++;
        }
    return count;
    }

static long LinesCount5(string s)
    {
    long count = 0;
    int posMax = s.Length;
    char[] a = s.ToCharArray();

    foreach (char c in a)
        if (c == '\n')
            count++;

    return count;
    }

static long LinesCount6(string s)
    {
    long count = 0;

    foreach (char c in s)
        if (c == '\n')
            count++;

    return count;
    }

static long LinesCount7(string s)
    {
    return s.Length - s.Replace("\n", "").Length;
    }
static long LinesCountAhoCorasick(string s)
    {
    IStringSearchAlgorithm searchAlg = new StringSearch();
    searchAlg.Keywords = new string[] { "\n" };
    StringSearchResult[] results = searchAlg.FindAll(s);
    return results.Length;
    }
static long LinesCountLinqCount(string s)
    {
    return s.Count(c => (c == '\n'));
    }
unsafe static long LinesCountUnsafePtr(string s)
    {
    long lineCount = 1;
    fixed (char* pchar = s)
        {
        char* p = pchar;
        for (; *p != '\0'; p++)
            {
            if (*p == '\n') lineCount++;
            }
        }
    return lineCount;
    }

The addition of a StreamReader version, which doesn't really fit, but people seem to be suggesting file based solutions:

C#
ssr.Start();
for (int i = 0; i < 100; i++)
    {
    StreamReader sr = new StreamReader(@"D:\Temp\MyLargeTextFile.txt");
    index = LinesCountStream(sr);
    sr.Dispose();
    }
ssr.Stop();

...

C#
static long LinesCountStream(StreamReader sr)
    {
    long count = 0;
    while (sr.ReadLine() != null)
        {
        count++;
        }
    return count;
    }

They all return the same number of lines, which is fine. But the timing tests were interesting:

Test    ElapsedMilliseconds
IndexOf     :    167
Index (2)   :    953
Index (3)   :   1026
Index (4)   :    167
Index (5)   :   1211
Index (6)   :   1544
Index (7)   :   1088
Split       :   1103
Regex       :   2517
Linq (Expl) :   3576
Linq Count  :   3531
Stream Read :    976
Unsafe Ptr  :    633

I had expected Regex to be much, much faster! The serious difference between Linq and Regex was a surprise, but I was really surprised when BF+I beat them both hands down! The Split method didn't surprise me by being slower than Index, as it means .NET has to allocate 23K strings each time. To finish it all off, I rewrote the BF+I version as an Extension Method, and made it assume that a string with no newlines was a single line. As an extension method, it was within a millisecond of the original times, as you would expect.

C#
static class ExtensionMethods
    {
    /// <summary>
    /// Returns the number of lines in a string
    /// </summary>
    /// <param name="s"></param>
    /// <returns></returns>
    public static long Lines(this string s)
        {
        long count = 1;
        int position = 0;
        while ((position = s.IndexOf('\n', position)) != -1)
            {
            count++;
            position++;         // Skip this occurrence!
            }
        return count;
        }
    }

BTW: Adding RegexOptions.Compiled to the regular expression brought it down to:

Test    ElapsedMilliseconds
BF+I     170
Split   1089
Regex   2063
Linq    3583

So it is WELL worth adding it to all your Regexes if you can.

Revision History

  • [Edit]Split method suggested by Grasshopper.iics[^] added -OriginalGriff[/edit]
  • [Edit2]I forgot to add the credit for the split method D'Oh! | <img src= " /> - OriginalGriff[/edit2]
  • [Edit3]Deeksha Shenoy[/Edit3]
  • [Edit4]Inclusion of more variants as suggested by: unruled boy[^] (Aho-Corasick, awaiting method) Grasshopper.iics[^] (Split) akemper[^] (Variations on BF+I, Replace, Split) George Swan[^] (LinqCount) Thanks to all these people (and others if I haven't updated this yet)! -OriginalGriff [/Edit4]
  • [Edit5]I forgot to introduce the three letter acronym "BF+I" when I first defined it as "Brute-Force-and-Ignorance". My thanks to fuzz_ball for pointing this out. Sorry for any confusion caused... - OriginalGriff[/Edit5]
  • [Edit6]Jon Bellamy[^] suggested a StreamReader, which I have included despite it being file related, rather than string. Please note that the test is different, as StreamReader does not support Seek or Rewind so has to be instantiated new each time. (Removing the Dispose only took 10 milliseconds off the time) Roman Shero[^] suggested the unsafe version Extra text added to the file read to make it clear why I load a file at all - OriginalGriff[/Edit6]
  • [Edit7]Added download for test rig (it's not pretty!), fixed formatting of code section (don't know what happened to it there), minor spelling mistake fixed, and Revision history reformatted - OriginalGriff[/Edit7]

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
CEO
Wales Wales
Born at an early age, he grew older. At the same time, his hair grew longer, and was tied up behind his head.
Has problems spelling the word "the".
Invented the portable cat-flap.
Currently, has not died yet. Or has he?

Comments and Discussions

 
GeneralMy vote of 5 Pin
S. M. Ahasan Habib18-Feb-13 17:08
professionalS. M. Ahasan Habib18-Feb-13 17:08 
GeneralMy vote of 5 Pin
VadimAlk9-Jul-12 18:07
VadimAlk9-Jul-12 18:07 
GeneralCombination of IndexOf an Parallel.ForEach Pin
DavorinP8-Jul-12 10:55
DavorinP8-Jul-12 10:55 
AnswerLast minute Pin
YvesDaoust25-Jun-12 0:34
YvesDaoust25-Jun-12 0:34 
QuestionVery sad ! Pin
YvesDaoust24-Jun-12 23:53
YvesDaoust24-Jun-12 23:53 
AnswerRe: Very sad ! Pin
OriginalGriff25-Jun-12 0:20
mveOriginalGriff25-Jun-12 0:20 
QuestionRegex comparison to precompiled would be nice Pin
Steav18-Jun-12 1:43
Steav18-Jun-12 1:43 
AnswerRe: Regex comparison to precompiled would be nice Pin
OriginalGriff18-Jun-12 2:25
mveOriginalGriff18-Jun-12 2:25 
GeneralRe: Thanks. Pin
millerize27-Feb-12 1:25
millerize27-Feb-12 1:25 
GeneralRe: If you're going to save off the Regex and use it more than o... Pin
Andrew Rissing9-Jan-12 4:56
Andrew Rissing9-Jan-12 4:56 
GeneralRe: In my case, I was writing a massively parallel find routine,... Pin
John Brett9-Jan-12 4:26
John Brett9-Jan-12 4:26 
GeneralReason for my vote of 5 Nice one. Pin
Nikhil_S26-Feb-12 17:43
professionalNikhil_S26-Feb-12 17:43 
GeneralCan anyone please explain why LinesCount2 is so slow? I thou... Pin
millerize26-Feb-12 6:57
millerize26-Feb-12 6:57 
Can anyone please explain why LinesCount2 is so slow? I thought it should be as fast as the LinesCountIndexOf.
GeneralRe: If you check the implementaiton of the IndexOf() method in R... Pin
Andreas Gieriet26-Feb-12 11:41
professionalAndreas Gieriet26-Feb-12 11:41 
GeneralReason for my vote of 5 Great analysis! Pin
Andreas Gieriet26-Feb-12 3:53
professionalAndreas Gieriet26-Feb-12 3:53 
GeneralWhat does BF+I stand for? It doesn't appear related to the m... Pin
Qwertie23-Jan-12 20:58
Qwertie23-Jan-12 20:58 
GeneralRe: "brute-force-and-ignorance" - it's in the first line of the ... Pin
OriginalGriff23-Jan-12 21:35
mveOriginalGriff23-Jan-12 21:35 
GeneralReason for my vote of 5 nicely illustrated Pin
RaviGupta798617-Jan-12 8:22
RaviGupta798617-Jan-12 8:22 
GeneralLine terminations are found closer to the end of a string th... Pin
Bob Van Wagner17-Jan-12 2:54
Bob Van Wagner17-Jan-12 2:54 
GeneralReason for my vote of 5 Really excellent thorough research a... Pin
BillWoodruff16-Jan-12 22:42
professionalBillWoodruff16-Jan-12 22:42 
GeneralReason for my vote of 5 Wonderful! Pin
Alex Essilfie16-Jan-12 22:15
Alex Essilfie16-Jan-12 22:15 
GeneralJust a general writing comment, when you toss in a TLA it's ... Pin
fuzz_ball16-Jan-12 9:46
fuzz_ball16-Jan-12 9:46 
GeneralRe: :O Sorry - I forgot...fixed Pin
OriginalGriff16-Jan-12 21:19
mveOriginalGriff16-Jan-12 21:19 
GeneralIt looks like Linq is built for comfort, not speed. Pin
George Swan11-Jan-12 6:00
mveGeorge Swan11-Jan-12 6:00 
GeneralOK, so how about a less ignorant version that ignores newlin... Pin
PIEBALDconsult9-Jan-12 4:29
mvePIEBALDconsult9-Jan-12 4:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.