Click here to Skip to main content
15,885,182 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
hi i have an question about huffman algorithm

we read a file and convert it to a string then we build the huffman tree and codes,then change the string to encoded string like:

input string : 12345
1 code:00
2 code:111
3 code:10
4 code:110
5 code:01
encoded string : 001111011001

and we write this string to a file

so how this algorithm decrease the size of file ?

i did this and my file was 4 KB after using this algorithm it change to 17 KB:|

whats wrong what should i do?

please help me
Posted

In the input string you, tipically, use eight bit (1 byte) for each symbol, hence the string "12345" is 5 bytes (40 bits) long.
The output string is coded using a variable number of bits for each symbol (for instance, in your example, the symbol '1' is coded using just two bits (namely 00) and the whole encoded ouput string is only 11 bits long (hence you have a compression factor of 40:11).
 
Share this answer
 
Comments
[no name] 30-Dec-11 17:30pm    
Great answer and correct. But maybe hard to see the details for a starter. Anyway a 5. Regards.
CPallini 31-Dec-11 7:47am    
Thank you.
DominoBoy 31-Dec-11 3:07am    
thx
CPallini 31-Dec-11 7:47am    
You are welcome.
First
Forget about thinking in strings while try to coding something with Huffman. Why?: Also ‘\0’ is a something one has to code. So better think about a stream of bytes you need to code.

Second
On very short information, Huffman coding can result in bigger results then the information was. Why: The dictionary takes some space, and if you try to code 8 bytes the chance is very high that the dictionary needs more bytes then the original information.

Third
Make a file with nodepad containing one character. Check the size in explorer. Depending on your hardisk format it shows in windows explorer 1KB in properties “Space” 1 Byte“ in “Space on volume” 4096 Bytes.

Conclusion
Check carefully the information.

And by the way Solution 1 is the theoretically correct explanation.

And congratulation that you try to figure it out by yourself and not "just" using a library!

Regards.
 
Share this answer
 
v4
Comments
DominoBoy 31-Dec-11 3:09am    
thx for help
[no name] 31-Dec-11 8:14am    
You are very welcome. Thank you for accepting; even I did not give the really useful solution.
You are on a good way, to try the things by yourself. Keep on going like this.
Regards
CPallini 31-Dec-11 7:48am    
My 5.
[no name] 31-Dec-11 8:01am    
Thank you very much.
Regards
It sounds like you are writing the string "001111011001" to the file as a string.

A string is a sequence of characters. Depending on how you are encoding the characters, each character will take 8 or 16 bits in the file. If you convert the single character "2" into the sequence of 3 characters "111" you've just trippled the size of your file.

Instead of a sequence of characters, what you are supposed to be creating is a sequence of bits.

Not sure how you do it in Java, but in C you could do it like this:

C++
char *encoded_string = "0011111000110000111100001110001100";

int len = strlen(encoded_string);

char byte_value = 0;
for (int ix = 0; ix < len; ix += 8) {
  byte_value = byte_value << 1;
  if (encoded_string[ix] == '1')
     byte_value = byte_value |= 1;

  if ((ix % 8) == 7) {
     fprintf(outputfile, "%c", byte_value);
     byte_value = 0;
  }
}

int remainder = ix % 8;
if (remainder) {
  byte_value = byte_value << remainder;
  fprintf(outputfile, "%c", byte_value);
}


There may be a Java library function that does that conversion for you, but I gave it to you in this format so you could see what actually needs to happen and you could understand the difference.

This code example assumes you want to encode with the most significant bit first. It gets around byte ordering issues by outputing a byte at a time.

The code pads out the end of the file with zero bits -- that may be an issue for your huffman encoding scheme. It probably ought to append an end of file code before padding out with zero bits.
 
Share this answer
 
Comments
DominoBoy 31-Dec-11 3:09am    
thx for help
CPallini 31-Dec-11 7:48am    
My 5.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900