Saturday, September 16, 2017

[Algorithms] Keeping operations on a Hash table constant

In this article, I will discuss how to keep search operation's run time constant in a hash data structure.

Let the table T has m slots. If h is a universal hash function then, for key x and y where x != y
probability(x) = h(y)) = 1 / m
if the number of keys = n, on average every slot will have n / m keys. this is also known as load factor, alpha. So, alpha = n / m.
so each slot will have on average alpha number of keys. So expected number of searches would be O(1 + alpha).
To keep searching constant, we need to keep alpha constant.

As number of keys, n grows alpha starts increasing. If we want to keep alpha ~ 1, we need to increase size of our table. If we can handle
resizing of the table in constant time, we will be able to do operations in hash table in constant time.

Idea is If alpha hits a certain threshold, then we double size of T. This way insertion will be in amortized O(1) time.

Example: Lets say initially T has size m = 1. Insertion has cost O(1) for a hash table as all it does is insert the element at the head.
When we double the table, the cost of insertion goes to O(n) from 1. Fortunately, this does not happen very often, which makes average insertion operation a O(n) time operation. Below I have shown the key number and the order of the operation. When there is only one key, the insertion time is 1. If we add another key, we need to increase size of our table and then copy all the data from old table to new table. This cause O(n) time.

Key#     cost
1             1
2             2
3             1
4             4
5             1
6             1
7             1
8             8
9             1
10           1
11           1
12           1
13           1
14           1
15           1
-----------------
total cost for n insertion  =  (1+1+1..) + (2+4+8) = O(n)  # as the bigger value gets scarce as you keep doubling the table
amortized cost = avg cost per insertion = O(n/n) = O(1)

for n = 1000, the following Haskel code snippet counts average cost of insert operation.
a = sum([2 ^ x | x <- [0..100], 2 ^ x < 1000] ++ take 1000 (repeat 1)) `div` 1000
print a # prints 2

Deletion
Table size = n, number of keys = k. Half the table size when k = n / 4

References:
0. MIT OCW 6.006 Introduction to Algortihms
1. MIT OCW 6.046J Introduction to Algorithms

No comments:

Post a Comment