a question about huffman algorithm

Question

4.00/5 (1 vote)

See more:

hi i have an question about huffman algorithm

we read a file and convert it to a string then we build the huffman tree and codes,then change the string to encoded string like:

input string : 12345
1 code:00
2 code:111
3 code:10
4 code:110
5 code:01
encoded string : 001111011001

and we write this string to a file

so how this algorithm decrease the size of file ?

i did this and my file was 4 KB after using this algorithm it change to 17 KB:|

whats wrong what should i do?

please help me

Posted 30-Dec-11 10:55am

DominoBoy

Add a Solution

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CPallini · Accepted Answer · 2011-12-30T11:05:00

Solution 1

In the input string you, tipically, use eight bit (1 byte) for each symbol, hence the string "12345" is 5 bytes (40 bits) long.
The output string is coded using a variable number of bits for each symbol (for instance, in your example, the symbol '1' is coded using just two bits (namely 00) and the whole encoded ouput string is only 11 bits long (hence you have a compression factor of 40:11).

Posted 30-Dec-11 11:05am

CPallini

Comments

[no name] 30-Dec-11 17:30pm

Great answer and correct. But maybe hard to see the details for a starter. Anyway a 5. Regards.

CPallini 31-Dec-11 7:47am

Thank you.

DominoBoy 31-Dec-11 3:07am

thx

CPallini 31-Dec-11 7:47am

You are welcome.

idle63 · Accepted Answer · 2011-12-30T11:21:00

First
Forget about thinking in strings while try to coding something with Huffman. Why?: Also ‘\0’ is a something one has to code. So better think about a stream of bytes you need to code.

Second
On very short information, Huffman coding can result in bigger results then the information was. Why: The dictionary takes some space, and if you try to code 8 bytes the chance is very high that the dictionary needs more bytes then the original information.

Third
Make a file with nodepad containing one character. Check the size in explorer. Depending on your hardisk format it shows in windows explorer 1KB in properties “Space” 1 Byte“ in “Space on volume” 4096 Bytes.

Conclusion
Check carefully the information.

And by the way Solution 1 is the theoretically correct explanation.

And congratulation that you try to figure it out by yourself and not "just" using a library!

Regards.

TRK3 · Accepted Answer · 2011-12-30T11:25:00

It sounds like you are writing the string "001111011001" to the file as a string.

A string is a sequence of characters. Depending on how you are encoding the characters, each character will take 8 or 16 bits in the file. If you convert the single character "2" into the sequence of 3 characters "111" you've just trippled the size of your file.

Instead of a sequence of characters, what you are supposed to be creating is a sequence of bits.

Not sure how you do it in Java, but in C you could do it like this:

C++

char *encoded_string = "0011111000110000111100001110001100";

int len = strlen(encoded_string);

char byte_value = 0;
for (int ix = 0; ix < len; ix += 8) {
  byte_value = byte_value << 1;
  if (encoded_string[ix] == '1')
     byte_value = byte_value |= 1;

  if ((ix % 8) == 7) {
     fprintf(outputfile, "%c", byte_value);
     byte_value = 0;
  }
}

int remainder = ix % 8;
if (remainder) {
  byte_value = byte_value << remainder;
  fprintf(outputfile, "%c", byte_value);
}

There may be a Java library function that does that conversion for you, but I gave it to you in this format so you could see what actually needs to happen and you could understand the difference.

This code example assumes you want to encode with the most significant bit first. It gets around byte ordering issues by outputing a byte at a time.

The code pads out the end of the file with zero bits -- that may be an issue for your huffman encoding scheme. It probably ought to append an end of file code before padding out with zero bits.

a question about huffman algorithm

3 solutions

Solution 1

Solution 2

Solution 3

Add your solution here

Preview 0