Blog on Huffman Algo.

Aadarsh Mishra
4 min readApr 7, 2022

HUFFMAN ENCODING’S BLOG

In this blog I will try to explain you the Huffman coding. Huffman coding uses Greedy technique for its implementation, Greedy algorithms are those that select the best option for themselves at all times. We’ll see the working of the Huffman coding in multiple programing languages. We use Huffman coding for compressing data, in this algo we reduce the size without losing data.

Huffman coding was developed by David Huffman and it is based on inequality of data usage. Over the years multiple better algorithms were made but they are directly or indirectly connected with the Huffman encoding or decoding algo. We use Huffman coding in compressing data that consists of frequently occurring data for example, 21 a’s, 31 b’s, 12c’s and 24 d’s. We will see the working of the Huffman coding in detail in the video also. Each of the characters are of 8 bits. For example, you have 20 characters in your data then, the number of bits in your data is 20*8 I.e., 160 bits.

Videos and Images use lossy compression, we can lose a bit of detail here not a train smash. But in case of text data, we cannot manage to lose any kind of data because the meaning of the text will be fully changed.

Let’s take an example of this algorithm

In the above figure there are 15 characters and each character is of 8 bits so the so total size(in bits) of the string is 15*8 I.e., 120 bits

  1. From the above example calculate the frequencies of each character and make a table.

2.After calculating the frequencies sort the characters based on their frequencies.

3.Now we will make start making the tree according to the priority queue, we will make an empty node ‘x’. Now, we will assign the lowest occurring frequency at the left of the x node and second lowest occurring at the right of the x node as shown below in this figure. The value of the x node will be the sum of left and the right node. x node will be added to the priority queue. Remove the two minimum frequencies from the priority queue and add the value of x node into the list. As shown in the figure.

4.Now check whether the priority queue is sorted or not, if not then sort it and then proceed. (in the above fig. It is sorted so proceed.)

5. Now as the new node value in above example is 9, we will put of c node at the left of new node(sorted) and proceed.

6.Repeat the steps until the huffman tree is not formed

7. Now add 0’s to the left branch of the huffman tree and 1’s to the right branch of the huffman tree.

8. After that, for getting the code of the character just combine the 0’s and 1’s which is written at the branches of the Huffman’s tree. Check the table below:

Without encoding total size of the string (in bits) was 120 bits and after doing the encoding the total size (in bits) of the string is 32+15+28 I.e., 75.

TIME COMPLEXITY OF HUFFMAN’S ALGORITHM

The time complexity for encoding each character based on the frequency is O(nlog n)

CODE OF HUFFMAN’S CODING:

(NOTE: The code is written in python language)

Now let’s see the code of the huffman coding

Code for Creating tree nodes:

Code for Calculating the frequencies:

APPLICATIONS OF HUFFMAN’S CODING:

  1. This algorithm is used in conventional compression formats in PKZIP, GZIP etc.
  2. For text and fax transmissions.

--

--

Aadarsh Mishra
0 Followers

I am Aadarsh Mishra from Bennett University