Click here to Skip to main content
15,884,986 members
Articles / Programming Languages / C#
Tip/Trick

Improve Stream Reading Performance in C#

Rate me:
Please Sign up or sign in to vote.
4.22/5 (10 votes)
22 Jan 2016MIT2 min read 47.1K   9   18
Improve reading performance of .NET streams using the Seek method instead of Position

Motivation

Due to the confusion the below code example seems to cause, here is a short motivation. Let's assume we want to read properties of an MP3 file, for example get a bitrate histogram of a file that is encoded with a variable bitrate. An MP3 file consists of a bunch of frames (in the magnitude of 10k for a 4 or 5 minute song). Each frame has a header of 4 bytes (see Wikipedia article), which contains information like the bitrate and the length of the data block that follows the header. To build the histogram, we need a buffer of 4 bytes, read the header and then skip to the next header.

Make sure to read the remarks section at the bottom of this tip. If you care about performance, you should probably avoid seeking at all. But since I've seen a lot of code that uses the Position property for seeking, I thought it was worth a tip ...

Performance Tests

The values of N and SKIP in the code below are chosen deliberately to illustrate the performance differences, even for small file sizes.

Implementing that "read some bytes, then skip a few" behavior, you might find yourself writing code like the following:

C#
// Number of bytes to read.
private const int N = 4;

// Number of bytes to skip.
private const int SKIP = 3;

private static void Test1(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        int count;

        byte[] buffer = new byte[N];

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);

            stream.Position += SKIP;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

private static int Checksum(byte[] buffer, int offsetStart, int offsetEnd)
{
    int sum = 0;

    for (int i = offsetStart; i < offsetEnd; i++)
    {
        sum += buffer[i];
    }

    return sum;
}

This seems straightforward, but with just one change you can greatly improve performance. Let's see what happens if we use stream.Seek instead of stream.Position:

C#
private static void Test2(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(SKIP, SeekOrigin.Current);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

We improved performance by a factor of 4. That's impressive.

As a last test, let's see what happens if we don't seek from the current position, but from the beginning of the stream:

C#
private static void Test3(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(position, SeekOrigin.Begin);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

Again, we see a small improvement (by a factor of 1.2).

Here's a sample output calling the test functions on a 6.27MB file (I made sure to call Test1 twice, the first call as a warm up and to make sure the file gets cached):

File size: 6581961 bytes
Elapsed  : 12654 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 3184 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 2668 ms
Checksum : 100462446

Conclusion

In the above example, we get an overall speed-up factor of 4.8. Results may vary on different PCs, but here are some rules you should follow when reading from .NET streams:

  1. Avoid setting the Position property. Always prefer the Seek method
  2. Avoid reading properties in loops (like Position or Length)
  3. Prefer using SeekOrigin.Begin

Remarks

Say you want to seek to a particular time offset in an audio file. That's obviously a valid use-case for seeking in a stream, but here it won't make a difference if you are using stream.Position or stream.Seek since it is just a single call. On the other hand, using seeking the way it is implemented above will always degrade performance in a massive way.

So, I guess my conclusion stays valid: if you do seeking, prefer the Seek method. But as a result of the discussion with GravityPhazer (see comments), here is a solution that doesn't use seeking at all. It's a bit more involved, because you need a way to synchronize two successive buffer reads, but it pays: runtime 50ms.

C#
private static void Test4(string file)
{
    const int SIZE = 1024;

    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        byte[] buffer = new byte[SIZE];

        int position = 0;
        int count, end;

        s.Start();

        // Fill the buffer.
        while ((count = stream.Read(buffer, 0, SIZE)) > 0)
        {
            if (position > SKIP)
            {
                // The previous frame overlapped with the current.
                hash += Checksum(buffer, 0, position - SKIP);
            }

            // Process the buffer.
            while (position < count)
            {
                end = position + N;

                if (end > count) end = count;

                hash += Checksum(buffer, position, end);
                position += (N + SKIP);
            }

            // Set the correct offset.
            position = position % SIZE;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Germany Germany
Studied math and computer science at the university of Dortmund.

MCTS .NET Framework 4, Windows Applications

Comments and Discussions

 
SuggestionMemoryMappedFile - Another Possible choice - Reading large files. Pin
Russell Mangel8-Feb-20 11:35
Russell Mangel8-Feb-20 11:35 
QuestionSomething looks suspicious in the code - not testing same conditions Pin
TnTinMn25-Jan-16 10:47
TnTinMn25-Jan-16 10:47 
AnswerRe: Something looks suspicious in the code - not testing same conditions Pin
Christian Woltering25-Jan-16 13:09
Christian Woltering25-Jan-16 13:09 
GeneralRe: Something looks suspicious in the code - not testing same conditions Pin
TnTinMn25-Jan-16 14:38
TnTinMn25-Jan-16 14:38 
QuestionWhy this choice. Pin
lemur22-Jan-16 9:43
lemur22-Jan-16 9:43 
AnswerRe: Why this choice. Pin
Christian Woltering22-Jan-16 10:05
Christian Woltering22-Jan-16 10:05 
GeneralRe: Why this choice. Pin
GravityPhazer22-Jan-16 12:43
professionalGravityPhazer22-Jan-16 12:43 
GeneralRe: Why this choice. Pin
Christian Woltering22-Jan-16 13:44
Christian Woltering22-Jan-16 13:44 
GeneralRe: Why this choice. Pin
GravityPhazer22-Jan-16 20:35
professionalGravityPhazer22-Jan-16 20:35 
GeneralRe: Why this choice. Pin
Christian Woltering22-Jan-16 21:28
Christian Woltering22-Jan-16 21:28 
GeneralRe: Why this choice. Pin
GravityPhazer23-Jan-16 1:29
professionalGravityPhazer23-Jan-16 1:29 
QuestionWould be probably much faster to read all data in chunk and skip unwanted bytes Pin
Philippe Mori22-Jan-16 6:47
Philippe Mori22-Jan-16 6:47 
AnswerRe: Would be probably much faster to read all data in chunk and skip unwanted bytes Pin
Christian Woltering22-Jan-16 9:21
Christian Woltering22-Jan-16 9:21 
GeneralRe: Would be probably much faster to read all data in chunk and skip unwanted bytes Pin
Philippe Mori22-Jan-16 19:01
Philippe Mori22-Jan-16 19:01 
GeneralAdditionnal comment Pin
Philippe Mori23-Jan-16 4:18
Philippe Mori23-Jan-16 4:18 
QuestionSkipping and buffer size Pin
GravityPhazer22-Jan-16 6:45
professionalGravityPhazer22-Jan-16 6:45 
AnswerRe: Skipping and buffer size Pin
Christian Woltering22-Jan-16 9:10
Christian Woltering22-Jan-16 9:10 
GeneralRe: Skipping and buffer size Pin
GravityPhazer22-Jan-16 10:36
professionalGravityPhazer22-Jan-16 10:36 
Definetly! From this perspective it's something completely different Wink | ;)
You better do this, cause this is really irritating.

Nonetheless seems your code to be unbelievable slow, which might be because you're seeking a few hundred times. It's a lot faster if you just read and ignore the bytes:

C#
bool skip = false;
while ((read = fs.Read(buffer, 0, skip ? Skip : bufferSize)) > 0)
{
    totalRead += read;
    skip = !skip;
}


Whereas bufferSize is 6 and Skip is 2.
I get a time of: 53.7938 ms (0.05s) compared to your fastest algorithm with 2668 ms.

With seek I get a result of 922.4907 ms (0.92 s). I guess the difference to yours is system related and because you have caluclations in your loop (position + N > length) which is totally redundant.

modified 22-Jan-16 16:49pm.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.