Click here to Skip to main content
15,884,472 members
Articles / Programming Languages / C#
Tip/Trick

Improve Stream Reading Performance in C#

Rate me:
Please Sign up or sign in to vote.
4.22/5 (10 votes)
22 Jan 2016MIT2 min read 47K   9   18
Improve reading performance of .NET streams using the Seek method instead of Position

Motivation

Due to the confusion the below code example seems to cause, here is a short motivation. Let's assume we want to read properties of an MP3 file, for example get a bitrate histogram of a file that is encoded with a variable bitrate. An MP3 file consists of a bunch of frames (in the magnitude of 10k for a 4 or 5 minute song). Each frame has a header of 4 bytes (see Wikipedia article), which contains information like the bitrate and the length of the data block that follows the header. To build the histogram, we need a buffer of 4 bytes, read the header and then skip to the next header.

Make sure to read the remarks section at the bottom of this tip. If you care about performance, you should probably avoid seeking at all. But since I've seen a lot of code that uses the Position property for seeking, I thought it was worth a tip ...

Performance Tests

The values of N and SKIP in the code below are chosen deliberately to illustrate the performance differences, even for small file sizes.

Implementing that "read some bytes, then skip a few" behavior, you might find yourself writing code like the following:

C#
// Number of bytes to read.
private const int N = 4;

// Number of bytes to skip.
private const int SKIP = 3;

private static void Test1(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        int count;

        byte[] buffer = new byte[N];

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);

            stream.Position += SKIP;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

private static int Checksum(byte[] buffer, int offsetStart, int offsetEnd)
{
    int sum = 0;

    for (int i = offsetStart; i < offsetEnd; i++)
    {
        sum += buffer[i];
    }

    return sum;
}

This seems straightforward, but with just one change you can greatly improve performance. Let's see what happens if we use stream.Seek instead of stream.Position:

C#
private static void Test2(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(SKIP, SeekOrigin.Current);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

We improved performance by a factor of 4. That's impressive.

As a last test, let's see what happens if we don't seek from the current position, but from the beginning of the stream:

C#
private static void Test3(string file)
{
    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;
        long position = 0;

        byte[] buffer = new byte[N];

        int count;

        s.Start();

        // Read a couple of bytes from the stream.
        while ((count = stream.Read(buffer, 0, N)) > 0)
        {
            hash += Checksum(buffer, 0, count);
            position += (N + SKIP);

            stream.Seek(position, SeekOrigin.Begin);
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

Again, we see a small improvement (by a factor of 1.2).

Here's a sample output calling the test functions on a 6.27MB file (I made sure to call Test1 twice, the first call as a warm up and to make sure the file gets cached):

File size: 6581961 bytes
Elapsed  : 12654 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 3184 ms
Checksum : 100462446

File size: 6581961 bytes
Elapsed  : 2668 ms
Checksum : 100462446

Conclusion

In the above example, we get an overall speed-up factor of 4.8. Results may vary on different PCs, but here are some rules you should follow when reading from .NET streams:

  1. Avoid setting the Position property. Always prefer the Seek method
  2. Avoid reading properties in loops (like Position or Length)
  3. Prefer using SeekOrigin.Begin

Remarks

Say you want to seek to a particular time offset in an audio file. That's obviously a valid use-case for seeking in a stream, but here it won't make a difference if you are using stream.Position or stream.Seek since it is just a single call. On the other hand, using seeking the way it is implemented above will always degrade performance in a massive way.

So, I guess my conclusion stays valid: if you do seeking, prefer the Seek method. But as a result of the discussion with GravityPhazer (see comments), here is a solution that doesn't use seeking at all. It's a bit more involved, because you need a way to synchronize two successive buffer reads, but it pays: runtime 50ms.

C#
private static void Test4(string file)
{
    const int SIZE = 1024;

    var s = new Stopwatch();

    // Open the file for reading.
    using (var stream = File.OpenRead(file))
    {
        long hash = 0;

        byte[] buffer = new byte[SIZE];

        int position = 0;
        int count, end;

        s.Start();

        // Fill the buffer.
        while ((count = stream.Read(buffer, 0, SIZE)) > 0)
        {
            if (position > SKIP)
            {
                // The previous frame overlapped with the current.
                hash += Checksum(buffer, 0, position - SKIP);
            }

            // Process the buffer.
            while (position < count)
            {
                end = position + N;

                if (end > count) end = count;

                hash += Checksum(buffer, position, end);
                position += (N + SKIP);
            }

            // Set the correct offset.
            position = position % SIZE;
        }

        s.Stop();

        Console.WriteLine("File size: {0} bytes", stream.Length);
        Console.WriteLine("Elapsed  : {0} ms", s.ElapsedMilliseconds);
        Console.WriteLine("Checksum : {0}", hash);
    }
}

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Germany Germany
Studied math and computer science at the university of Dortmund.

MCTS .NET Framework 4, Windows Applications

Comments and Discussions

 
SuggestionMemoryMappedFile - Another Possible choice - Reading large files. Pin
Russell Mangel8-Feb-20 11:35
Russell Mangel8-Feb-20 11:35 
QuestionSomething looks suspicious in the code - not testing same conditions Pin
TnTinMn25-Jan-16 10:47
TnTinMn25-Jan-16 10:47 
AnswerRe: Something looks suspicious in the code - not testing same conditions Pin
Christian Woltering25-Jan-16 13:09
Christian Woltering25-Jan-16 13:09 
GeneralRe: Something looks suspicious in the code - not testing same conditions Pin
TnTinMn25-Jan-16 14:38
TnTinMn25-Jan-16 14:38 
QuestionWhy this choice. Pin
lemur22-Jan-16 9:43
lemur22-Jan-16 9:43 
AnswerRe: Why this choice. Pin
Christian Woltering22-Jan-16 10:05
Christian Woltering22-Jan-16 10:05 
GeneralRe: Why this choice. Pin
GravityPhazer22-Jan-16 12:43
professionalGravityPhazer22-Jan-16 12:43 
As you described it, it is not necessarily much more difficult. It's okay that you mentioned it, but after all this is bad practice at it's best. Like my recent comment showed: It slows down from ~50ms to ~920ms which is insanely much.

You say the header does contain how long the frame is and stuff. Headers are constant sized by convention, so you don't have to care about the header size. You simply read chunk by chunk (best 1024+ bytes as my benchmark - same comment - showed) and skip that much bytes as the current header tells you. This is not necessarily difficult to implement and after all is the performance impact so massive that you should always consider using this. It may doesn't make any difference if you're processing a single file. Everyone can wait 1-4 seconds. But if you process a few thousand files, you probably want a file taking only 50-200 ms instead of 1+ seconds each Wink | ;)

I get the point of your article, but I personally think you're missing your actual point a little. I mean like I had to ask: It's very specific. Skipping is something you do relatively rarely. Most of the time you read the whole file. It may be even faster if you read the file at once and process it in memory afterwards. The next thing is that you do show that certain implementations are worse or better but in the end none of them is really good. After all you seem to suggest that the best thing you could do is seeking from the beginning all the time. I suggest that you add another solution without seeking, else I can't approve this to be a very good tip.

I'm sorry for that, no offense meant!
GeneralRe: Why this choice. Pin
Christian Woltering22-Jan-16 13:44
Christian Woltering22-Jan-16 13:44 
GeneralRe: Why this choice. Pin
GravityPhazer22-Jan-16 20:35
professionalGravityPhazer22-Jan-16 20:35 
GeneralRe: Why this choice. Pin
Christian Woltering22-Jan-16 21:28
Christian Woltering22-Jan-16 21:28 
GeneralRe: Why this choice. Pin
GravityPhazer23-Jan-16 1:29
professionalGravityPhazer23-Jan-16 1:29 
QuestionWould be probably much faster to read all data in chunk and skip unwanted bytes Pin
Philippe Mori22-Jan-16 6:47
Philippe Mori22-Jan-16 6:47 
AnswerRe: Would be probably much faster to read all data in chunk and skip unwanted bytes Pin
Christian Woltering22-Jan-16 9:21
Christian Woltering22-Jan-16 9:21 
GeneralRe: Would be probably much faster to read all data in chunk and skip unwanted bytes Pin
Philippe Mori22-Jan-16 19:01
Philippe Mori22-Jan-16 19:01 
GeneralAdditionnal comment Pin
Philippe Mori23-Jan-16 4:18
Philippe Mori23-Jan-16 4:18 
QuestionSkipping and buffer size Pin
GravityPhazer22-Jan-16 6:45
professionalGravityPhazer22-Jan-16 6:45 
AnswerRe: Skipping and buffer size Pin
Christian Woltering22-Jan-16 9:10
Christian Woltering22-Jan-16 9:10 
GeneralRe: Skipping and buffer size Pin
GravityPhazer22-Jan-16 10:36
professionalGravityPhazer22-Jan-16 10:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.