Click here to Skip to main content
15,894,405 members
Articles / Programming Languages / C#

FileDiff Contest Entry

Rate me:
Please Sign up or sign in to vote.
3.33/5 (5 votes)
12 Aug 2009CPOL 20.6K   130   9   5
Text Difference between two files

Introduction

This is a contest entry for file differences.

Using the Code

This application is pretty basic. It uses FileStream objects to perform its task.

C#
ASCIIEncoding encode = new ASCIIEncoding();
FileStream fileA = File.OpenRead(args[0]); 
FileStream fileB = File.OpenRead(args[1]);

int b = 0;
int l = 0;

fileA.Position = 0;
fileB.Position = 0;

We start off by opening the files and setting the positions within the files to 0. The var b is the last byte read and the var l is the length of the changed bytes. 

C#
while (fileA.Position <= (fileA.Length - 1))
{
    b = fileA.ReadByte();

    if (fileB.Position <= (fileB.Length - 1))
    {
        if (b != fileB.ReadByte())
        {
            l = 1;

            while (fileB.Position <= (fileB.Length - 1) && 
                fileB.ReadByte() != b)
            l += 1;

            byte[] s = new byte[l];
            fileB.Seek(fileB.Position - l, 0);
            fileB.Read(s, 0, l);
 
            Console.WriteLine("FileDiff Pos:{0}, Len{1}, Str:{2}",
                              fileA.Position, 
                              l, 
                              encode.GetString(s));
        }
    }
}

fileA.Close();
fileB.Close(); 

This is the main application loop. As you can see, it steps through the file byte by byte. When two bytes are different, it stops looking and scans stream B for the next byte that's equal to stream A.

Points of Interest

This is a contest entry, written in C# with .NET v2, is 73 lines including blank lines / comments and formatted code + the timers, etc. The number of lines that are not overhead, blank or comments are 28. It uses 4,384K Memory (Private Working Set) and the EXE is 5.5k.

Run-time is roughly 30 milliseconds, output is the position in the file, length of the Diff and the textual representation. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
I started programming for fun when I was about 10 on an Franklin Ace 1000.

I still do it just for fun but it has gotten me a few jobs over the years. More then I can say for my Microsoft Certifications. Smile | :)

The way I learned was by example, now its time to give back to the next generation of coders.



Comments and Discussions

 
GeneralA file diff should be symmetric. Pin
TRK318-Aug-09 7:50
TRK318-Aug-09 7:50 
Biggest problem with this is that the algorithm doesn't produce symmetric results:

Diff A B should produce results similar to Diff B A

However if A and B are identical files except that A has a unique character inserted in it at the beginning that doesn't appear anywhere in B, then Diff A B will give you the entire file B as a difference, where Diff B A would just give you the one character.

While not stated in the contest rules, I'd think something approximating symmetry should be expected.

The code is small, but the 4K memory usage is huge for the functionality provided, I'm guessing the 4K memory must be the default file buffers.
GeneralMy vote of 2 Pin
pocheptsov18-Aug-09 7:03
pocheptsov18-Aug-09 7:03 
GeneralMy vote of 1 Pin
crayzeecoder14-Aug-09 22:40
crayzeecoder14-Aug-09 22:40 
QuestionBut what... is it good for? Pin
PIEBALDconsult12-Aug-09 18:39
mvePIEBALDconsult12-Aug-09 18:39 
AnswerRe: But what... is it good for? Pin
Matthew Hazlett12-Aug-09 19:06
Matthew Hazlett12-Aug-09 19:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.