Skip to content

Commit 62df6ea

Browse files
author
Albert Hu
authored
Merge pull request #30 from alberthu16/day41
Day 41: Redo topKFrequent using a heap for efficiency
2 parents cfb249f + 50adeb8 commit 62df6ea

File tree

2 files changed

+148
-0
lines changed

2 files changed

+148
-0
lines changed

day41/README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
[Today's challenge is actually a follow-up on day37's challenge -- using a heap instead of radix sort]
2+
3+
Question of the day: https://leetcode.com/problems/top-k-frequent-elements/#/description
4+
5+
Given a non-empty array of integers, return the k most frequent elements.
6+
7+
For example,
8+
Given `[1,1,1,2,2,3]` and `k = 2`, return `[1,2]`.
9+
10+
Note:
11+
* You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
12+
* Your algorithm's time complexity must be better than O(n log n), where
13+
n is the array's size.
14+
15+
## Ideas
16+
17+
Can't do a normal sort, since that alone will take `O(nlogn)` runtime.
18+
The input array isn't sorted, so we need to keep track of a count and organize
19+
that count somehow as we iterate through the integers in the array. There
20+
doesn't seem to be any constraints on the types of integers on the array,
21+
so I'll assume that possible elements in the array range from -maxInt to maxInt
22+
.
23+
24+
I think I can actually use radix sort again. Same idea as the challenge from
25+
[Day 36](../day36).
26+
27+
## Code
28+
[Day 37 - Python](../day37/topKFrequent.py)
29+
30+
## Follow-up
31+
32+
Over the past few days ([38](../day38), [39](../day39), [40](../day40)), I got a
33+
little more familiar with the heap data structure and finally understand why
34+
heapifying an unsorted array can be done in linear time. The operations involved
35+
in heapify decrease exponentially over a logarithmic range, resulting in an overall
36+
linear amount of work. Anyways, I can use this `O(n)` time to heapify an unsorted
37+
array of frequencies, and then pop off the top `k` frequencies in `O(klogn)` time.
38+
The overall runtime would now be at most `O(nlogn)` if `k` == `n`. However, the
39+
real savings is in the `O(n)` space for storing all the elements of the heap.
40+
Much better than the `O(max value of the input array)` I had before.
41+
42+
## Code
43+
[Day 41 - Python](./topKFrequen.py)

day41/topKFrequent.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
## A sample heap data structure ##
2+
from collections import deque
3+
4+
class MaxHeap:
5+
def __init__(self, arr=[]):
6+
self.heap = deque()
7+
self.size = 0
8+
if len(arr) > 0:
9+
self.size = len(arr)
10+
self.heapify(arr)
11+
12+
# runtime: O(logn) aka the height of the heap
13+
def getMax(self):
14+
if self.size > 0:
15+
ret = self.heap.popleft()
16+
self.size -= 1
17+
if self.size > 0:
18+
self.heap.appendleft(self.heap.pop())
19+
self.bubbleDown(0)
20+
return ret
21+
22+
# runtime: O(1)
23+
def peek(self):
24+
if self.size > 0:
25+
return self.heap[0]
26+
27+
# runtime: O(logn) aka the height of the heap
28+
def push(self, val):
29+
self.size += 1
30+
self.heap.append(val)
31+
self.bubbleUp()
32+
33+
# runtime: O(nlogn)
34+
def heapify(self, arr):
35+
self.heap = deque(arr)
36+
for i in xrange(self.size-1, -1, -1):
37+
self.bubbleDown(i)
38+
39+
# runtime: O(1)
40+
def isEmpty(self):
41+
return self.size == 0
42+
43+
def bubbleDown(self, index):
44+
if self.size > 0:
45+
i = index
46+
h = self.heap
47+
withinBounds = 2*i + 2 < self.size
48+
while withinBounds and (h[i][1] < h[2*i + 1][1] or h[i][1] < h[2*i + 2][1]):
49+
if h[i][1] < h[2*i + 1][1] and h[i][1] < h[2*i + 2][1]:
50+
if h[2*i + 1][1] > h[2*i + 2][1]:
51+
h[i], h[2*i + 1] = h[2*i + 1], h[i]
52+
i = 2*i + 1
53+
else:
54+
h[i], h[2*i + 2] = h[2*i + 2], h[i]
55+
i = 2*i + 2
56+
elif h[i][1] < h[2*i + 1][1]:
57+
h[i], h[2*i + 1] = h[2*i + 1], h[i]
58+
i = 2*i + 1
59+
elif h[i][1] < h[2*i + 2][1]:
60+
h[i], h[2*i + 2] = h[2*i + 2], h[i]
61+
i = 2*i + 2
62+
withinBounds = 2*i + 2 < self.size
63+
64+
if 2*i + 1 < self.size and h[i][1] < h[2*i + 1][1]:
65+
h[i], h[2*i + 1] = h[2*i + 1], h[i]
66+
elif 2*i + 2 < self.size and h[i][1] < h[2*i + 2][1]:
67+
h[i], h[2*i + 2] = h[2*i + 2], h[i]
68+
69+
def bubbleUp(self):
70+
if self.size > 0:
71+
i = self.size-1
72+
h = self.heap
73+
withinBounds = i/2 >= 0
74+
while withinBounds and (h[i] > h[i/2]):
75+
h[i/2], h[i] = h[i], h[i/2]
76+
i /= 2
77+
withinBounds = i/2 >= 0
78+
79+
from collections import Counter
80+
81+
def topKFrequent(nums, k):
82+
freqs = Counter(nums)
83+
h = MaxHeap(freqs.items())
84+
ret = list()
85+
while k > 0:
86+
ret.append(h.getMax()[0])
87+
k -=1
88+
return ret
89+
90+
def testTopKFrequent():
91+
assert set(topKFrequent([], 0)) == set([])
92+
assert set(topKFrequent([1], 1)) == set([1])
93+
assert set(topKFrequent([-1, -1], 1)) == set([-1])
94+
assert set(topKFrequent([1,1,1,2,2,3], 2)) == set([1, 2])
95+
assert set(topKFrequent([-1,-1,-1,2,2,3], 2)) == set([-1, 2])
96+
assert set(topKFrequent([1,1,1,2,2,3], 3)) == set([1, 2, 3])
97+
assert set(topKFrequent([1,1,1,2,2,2,3,3,3], 3)) == set([1, 2, 3])
98+
assert set(topKFrequent([4,1,-1,2,-1,2,3], 2)) == set([-1, 2])
99+
assert set(topKFrequent([3,2,3,1,2,4,5,5,6,7,7,8,2,3,1,1,1,10,11,5,6,2,4,7,8,5,6], 10)) == set([1,2,5,3,7,6,4,8,10,11])
100+
101+
def main():
102+
testTopKFrequent()
103+
104+
if __name__ == "__main__":
105+
main()

0 commit comments

Comments
 (0)