Skip to content

Commit bd39339

Browse files
Merge pull request #217 from abhisheks008/main
Huffman Coding using Greedy Approach
2 parents 5e963f2 + e4ca730 commit bd39339

File tree

5 files changed

+336
-0
lines changed

5 files changed

+336
-0
lines changed
20.7 KB
Loading
17.7 KB
Loading

Greedy/Huffman Coding/README.md

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Huffman Coding using Greedy Approach
2+
🔴 Language used : **Python 3**
3+
4+
## 🎯 Aim
5+
The aim of this script is to find out the huffman code of each characters presented in the list in an ascending order.
6+
7+
## 👉 Purpose
8+
The main purpose of this script is to show the implementation of Greedy Approach to find out the the huffman code of each characters presented in the list in an ascending order.
9+
10+
## 📄 Description
11+
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
12+
13+
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.
14+
15+
🔴 Examples:
16+
17+
```
18+
Constraints:
19+
chars[] -> an array of characters.
20+
freq[] -> an array of frequencies of the respective characters.
21+
22+
Input:
23+
character Frequency
24+
a 5
25+
b 9
26+
c 12
27+
d 13
28+
e 16
29+
f 45
30+
31+
After processing through the algorithm, it will generate the Huffman codes
32+
for each of the characters presented in an ascending order.
33+
34+
The output will be like this,
35+
character Huffman Code
36+
f 0
37+
c 100
38+
d 101
39+
a 1100
40+
b 1101
41+
e 111
42+
```
43+
44+
## 🧮 Workflow & Algorithm
45+
Let's discuss the workflow and the algorithm with the above mentioned example.
46+
- Build a min heap that contains 6 nodes where each node represents root of a tree with single node.
47+
- Extract two minimum frequency nodes from min heap. Add a new internal node with frequency `5 + 9 = 14.`
48+
```
49+
14
50+
/ \
51+
/ \
52+
a -> 5 b -> 9
53+
54+
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each,
55+
and one heap node is root of tree with 3 elements
56+
57+
The tree will be now,
58+
character Frequency
59+
c 12
60+
d 13
61+
Internal Node 14
62+
e 16
63+
f 45
64+
```
65+
- Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency `12 + 13 = 25`
66+
```
67+
25
68+
/ \
69+
/ \
70+
c -> 12 d -> 13
71+
72+
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each,
73+
and two heap nodes are root of tree with more than one nodes
74+
75+
character Frequency
76+
Internal Node 14
77+
e 16
78+
Internal Node 25
79+
f 45
80+
```
81+
- Extract two minimum frequency nodes. Add a new internal node with frequency `14 + 16 = 30`
82+
```
83+
30
84+
/ \
85+
14 e -> 16
86+
/ \
87+
/ \
88+
a -> 5 b -> 9
89+
90+
Now min heap contains 3 nodes.
91+
92+
character Frequency
93+
Internal Node 25
94+
Internal Node 30
95+
f 45
96+
```
97+
- Extract two minimum frequency nodes. Add a new internal node with frequency `25 + 30 = 55`
98+
```
99+
55
100+
/ \
101+
/ 30
102+
/ / \
103+
25 14 e -> 16
104+
/ \ / \
105+
c d a b
106+
12 13 5 9
107+
108+
Now min heap contains 2 nodes.
109+
110+
character Frequency
111+
f 45
112+
Internal Node 55
113+
```
114+
- Extract two minimum frequency nodes. Add a new internal node with frequency `45 + 55 = 100`
115+
```
116+
100
117+
/ \
118+
f->45 \
119+
55
120+
/ \
121+
/ 30
122+
/ / \
123+
25 14 e -> 16
124+
/ \ / \
125+
c d a b
126+
12 13 5 9
127+
128+
Now min heap contains only one node.
129+
130+
character Frequency
131+
Internal Node 100
132+
```
133+
- Since the heap contains only one node, the algorithm stops here.
134+
- **Steps to print codes from Huffman Tree:** Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered.
135+
```
136+
(0) 100 (1)
137+
/ \
138+
f->45 \
139+
55
140+
/ \ (1)
141+
(0) / 30
142+
/ (0) / \ (1)
143+
25 14 e -> 16
144+
/ \ / \
145+
c d a b
146+
12 13 5 9
147+
(0) (1) (0) (1)
148+
```
149+
- The codes are as follows:
150+
```
151+
character code-word
152+
f 0
153+
c 100
154+
d 101
155+
a 1100
156+
b 1101
157+
e 111
158+
```
159+
160+
## 💻 Input and Output
161+
- **Test Case 1 :**
162+
```python
163+
Input Given :
164+
chars = ['a', 'b', 'c', 'd', 'e', 'f']
165+
freq = [ 5, 9, 12, 13, 16, 45]
166+
```
167+
168+
![](https://github.com/abhisheks008/PyAlgo-Tree/blob/main/Greedy/Huffman%20Coding/Images/hc-1.png)
169+
170+
- **Test Case 2 :**
171+
```python
172+
Input Given :
173+
chars = ['a', 'b', 'c', 'd']
174+
freq = [ 5, 1, 6, 3]
175+
```
176+
![](https://github.com/abhisheks008/PyAlgo-Tree/blob/main/Greedy/Huffman%20Coding/Images/hc-2.png)
177+
178+
## ⏰ Time and Space complexity
179+
- **Time Complexity :** `O(n*log n)`.
180+
- **Space Complexity :** `O(n*log n)`.
181+
182+
---------------------------------------------------------------
183+
## 🖋️ Author
184+
**Code contributed by, _Abhishek Sharma_, 2022 [@abhisheks008](github.com/abhisheks008)**
185+
186+
[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Problem name: Huffman Coding
2+
# Approach: Greedy Method
3+
4+
# -----------------------------------------------------------------------------------------------
5+
6+
# Problem Statement: Huffman coding is a lossless data compression algorithm.
7+
# The idea is to assign variable-length codes to input
8+
# characters, lengths of the assigned codes are based on the
9+
# frequencies of corresponding characters. The most frequent
10+
# character gets the smallest code and the least frequent
11+
# character gets the largest code.
12+
13+
# -----------------------------------------------------------------------------------------------
14+
15+
# Constraints:
16+
# chars[] -> set of characters/array of characters.
17+
# freq[] -> frequency of each of the characters in the given set.
18+
19+
# -----------------------------------------------------------------------------------------------
20+
21+
# importing the library named as heapq for the implementation of the huffman tree.
22+
import heapq
23+
24+
# class node defined as the back bone of the node class
25+
class node:
26+
def __init__(self, freq, symbol, left=None, right=None):
27+
# frequency of symbol
28+
self.freq = freq
29+
30+
# symbol name (character)
31+
self.symbol = symbol
32+
33+
# node left of current node
34+
self.left = left
35+
36+
# node right of current node
37+
self.right = right
38+
39+
# tree direction (0/1)
40+
self.huff = ''
41+
42+
def __lt__(self, nxt):
43+
return self.freq < nxt.freq
44+
45+
46+
# utility function to print huffman
47+
# codes for all symbols in the newly
48+
# created Huffman tree
49+
def printNodes(node, val=''):
50+
51+
# huffman code for current node
52+
newVal = val + str(node.huff)
53+
54+
# if node is not an edge node
55+
# then traverse inside it
56+
if(node.left):
57+
printNodes(node.left, newVal)
58+
if(node.right):
59+
printNodes(node.right, newVal)
60+
61+
# if node is edge node then
62+
# display its huffman code
63+
if(not node.left and not node.right):
64+
print(" {0} -> {1}".format(node.symbol, newVal))
65+
66+
67+
# characters for huffman tree
68+
chars = ['a', 'b', 'c', 'd', 'e', 'f']
69+
70+
# frequency of characters
71+
freq = [ 5, 9, 12, 13, 16, 45]
72+
73+
print ("-- Huffman Coding using Greedy Method --")
74+
print ()
75+
print ("Provided input for implementing the Huffman Tree...")
76+
print ("Characters Frequency")
77+
print ("---------------------------")
78+
for k in range (0, len(chars)):
79+
print (" {0} -> {1}".format(chars[k],freq[k]))
80+
print ()
81+
82+
83+
# list containing unused nodes
84+
nodes = []
85+
86+
# converting characters and frequencies
87+
# into huffman tree nodes
88+
for x in range(len(chars)):
89+
heapq.heappush(nodes, node(freq[x], chars[x]))
90+
91+
while len(nodes) > 1:
92+
93+
# sort all the nodes in ascending order
94+
# based on their frequency
95+
left = heapq.heappop(nodes)
96+
right = heapq.heappop(nodes)
97+
98+
# assign directional value to these nodes
99+
left.huff = 0
100+
right.huff = 1
101+
102+
# combine the 2 smallest nodes to create
103+
# new node as their parent
104+
newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
105+
106+
heapq.heappush(nodes, newNode)
107+
108+
# Huffman Tree is ready!
109+
print ("Creating Huffman Tree...\n")
110+
print ("Your Huffman Tree is ready! Here you go...")
111+
print ()
112+
print ("Characters Huffman Code")
113+
print ("-----------------------------")
114+
printNodes(nodes[0])
115+
116+
117+
# -----------------------------------------------------------------------------------------------
118+
119+
# Output:
120+
# -- Huffman Coding using Greedy Method --
121+
122+
# Provided input for implementing the Huffman Tree...
123+
# Characters Frequency
124+
# ---------------------------
125+
# a -> 5
126+
# b -> 9
127+
# c -> 12
128+
# d -> 13
129+
# e -> 16
130+
# f -> 45
131+
132+
# Creating Huffman Tree...
133+
134+
# Your Huffman Tree is ready! Here you go...
135+
136+
# Characters Huffman Code
137+
# -----------------------------
138+
# f -> 0
139+
# c -> 100
140+
# d -> 101
141+
# a -> 1100
142+
# b -> 1101
143+
# e -> 111
144+
145+
# -----------------------------------------------------------------------------------------------
146+
147+
# Code contributed by, Abhishek Sharma, 2022
148+
149+
# -----------------------------------------------------------------------------------------------

Greedy/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22

33
- [**Activity Selection Problem**](https://github.com/abhisheks008/PyAlgo-Tree/tree/main/Greedy/Activity%20Selection%20Problem)
44
- [**Job Sequencing Problem**](https://github.com/abhisheks008/PyAlgo-Tree/tree/main/Greedy/Job%20Sequencing%20Problem)
5+
- [**Huffman Coding**](https://github.com/abhisheks008/PyAlgo-Tree/tree/main/Greedy/Huffman%20Coding)

0 commit comments

Comments
 (0)