Click here to Skip to main content
15,891,692 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hi, I have the following program working which will prove MD5 collisions for various files containing slightly different data. However, when I create the files using proven collision data from wikipedia http://en.wikipedia.org/wiki/Md5[^]I get hashes that do not match and neither of them match the correct hash of 79054025255fb1a26e4bc422aef54eb4. Yet with any other file (typically containing binary data) I am able to prove collisions.

The two inputs from the files are:
VB
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70


and

VB
d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70


The program I'm using is below. Is there any way to alter this program to cater for these hex files or do I need to write a completely new one - I have seen solutions that look extremely complicated so I'm just wondering if the program below can be altered. Thanks.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;
using System.IO;

namespace MD53
{
    class Program
    {
        static void Main(string[] args)
        {
            byte[] file1Hash = GetMd5HashOfFile("FileA.txt");
            byte[] file2Hash = GetMd5HashOfFile("FileB.txt");
            ConvertToString(file1Hash);
            ConvertToString(file2Hash);

            byte[] file3Hash = GetSHAOfFile("M13A.txt");
            byte[] file4Hash = GetSHAOfFile("M13B.txt");
            ConvertToString(file3Hash);
            ConvertToString(file4Hash);
            
        }

        static byte[] GetMd5HashOfFile(string filepath)
        {
            using (MD5CryptoServiceProvider hashProvider = new MD5CryptoServiceProvider())
            {
                using (FileStream sr = new FileStream(filepath, FileMode.Open, FileAccess.Read))
                {
                    return hashProvider.ComputeHash(sr);
                }
            }
        }
        static byte[] GetSHAOfFile(string filepath)
        {
            using (SHA1CryptoServiceProvider hashProvider = new SHA1CryptoServiceProvider())
            {
                using (FileStream sr = new FileStream(filepath, FileMode.Open, FileAccess.Read))
                {
                    return hashProvider.ComputeHash(sr);
                }
            }
        }
        static void ConvertToString(byte[] fileHash)
        {
            StringBuilder hex1 = new StringBuilder(fileHash.Length);
            foreach (byte b in fileHash)
                hex1.AppendFormat("{0:x2}", b);
            string output = hex1.ToString();
            Console.WriteLine(output);
            return;
        }
        
    }
}
Posted
Updated 13-Sep-12 5:32am
v2
Comments
sjelen 13-Sep-12 13:03pm    
You're treating text files as binary, you made the same mistake here:
www.codeproject.com/Answers/444145/Trying-to-Port-a-C-MD5-Hash-Collision-Program-to-C
BrianHamilton 14-Sep-12 6:36am    
Yes, and I think it was yourself that provided a fully working version (as shown below) - I assume the MD5 on wiki is the exception as all other files work fine with the other program - I just found the solution below a little hard to understand - I've usually done encrypting/decrypting with no problems whether it be string or files, but I get that wikipedia were showing a representation of a binary file that probably exists somewhere..

<pre>
static void Main(string[] args)
{
byte[] myBytes1 = ReadHexFile("FileA.txt");
byte[] myBytes2 = ReadHexFile("FileB.txt");

using (MD5 md5Hash = MD5.Create())
{
string hash1 = GetMd5Hash(md5Hash, myBytes1);
string hash2 = GetMd5Hash(md5Hash, myBytes2);
Console.WriteLine("The MD5 hash of " + string.Join("", myBytes1.Select(b => b.ToString("x2"))) + " is: " + hash1 + ".");
Console.WriteLine("The MD5 hash of " + string.Join("", myBytes1.Select(b => b.ToString("x2"))) + " is: " + hash2 + ".");
}
Console.ReadLine();
}

static string GetMd5Hash(MD5 md5Hash, byte[] input)
{
byte[] data = md5Hash.ComputeHash(input);
return string.Join("", data.Select(b => b.ToString("x2")));
}

static byte[] ReadHexFile(string path)
{
System.IO.FileStream fs1 = System.IO.File.OpenRead(path);
System.IO.StreamReader sr1 = new System.IO.StreamReader(fs1);

string hexString = sr1.ReadToEnd();
hexString = System.Text.RegularExpressions.Regex.Replace(hexString, @"\W", "", System.Text.RegularExpressions.RegexOptions.Singleline);
int ix = 0;

byte[] myBytes1 = new byte[hexString.Length / 2];
while (ix < hexString.Length)
{
string hexByte = hexString.Substring(ix, 2);
myBytes1[ix / 2] = byte.Parse(hexByte, System.Globalization.NumberStyles.HexNumber);
ix += 2;
}

return myBytes1;
}
</pre>
sjelen 17-Sep-12 6:21am    
What part you don't understand?
ReadHexFile() reads a text file that contains hexadecimal representation of binary data into a byte[].
Regex is user to eliminate all whitespace an other non-data characters.
Then in the loop the string containing hex values is split into 2 char substrings
and each one is parsed to a byte and put in the array.

1 solution

Your problem is that you are treating the data as text - it isn't. It is hex data, ie 128 bytes of binary data displayed on the wiki page as hex digits. You are holding it (and reading it) as 256 text characters (with or without line breaks) which will produce completely different hash values from the ones you expect.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900