Search and You will Find!: [Algorithms] Perfect Hashing: Guarantying constant time operation for hash function

For any hash function, it is possible to find a set of keys that will result O(n) time in search. In this article, I will talk about a scenario when we already know the keys, and we want to do the look up the keys in O(1) time even for the worst case.

Perfect Hashing Scheme:
We need two level hashing. In first level, we have n slots in hash table T where n = number of keys.
At level1, we use one hash function h_l1. Each slot of T contains 3 things
[number of items, hash function, pointer to array holding keys at level 2]

for any slot in T, size of key array in level 2 = (number of keys in level 2) ^ 2.

Index Level-1 Level-2
0 [2 | 40 | ->] [- | 40 | 37 | 27 ]
1 [1 | 01 | ->] [- | 3 ]
2 [1 | 23 | ->] [- | 25 ]
3 [0 | 19 | ->] [ ]
4 [1 | 44 | ->] [ 127 ]

at 0 index, level-1 slot has [2 | 40 | ->] this means, level-2 has 2 elements. So level-2 hash table will have size 2 ^ 2 = 4. the value 40 means universal hash function number 40, h_40 will be used to hash in level 2.

In level-1 there will be collision. We need to guarantee that in level-2 we can find hash functions that won't collide for m_i number of keys.

Proof:
x is a random variable that represents number of collisions in level-2 for a slot i. in i, level-2 has m_i number of keys. then hash table has size (m_i) ^ 2 slots. since, a universal hash function will be used probability of any two keys collide would be 1 / (m_i) ^ 2.

E[x] = sum (1/(m_i ^ 2)) for all keys x and y for given set of keys in level-2 at slot i.
we can make n Choose 2 combinations, so
E[x] = (m_i C 2) * (1/(m_i ^ 2)) < 1 / 2
E[x] <= 1/2

from Markov Inequality, P{x >= t} <= E[x] / t
P{x >= 1} <= E[x] / 2 <= 1/2 [0]
Equation [0], basically says, if we use universal hashing, at least half the cases there will be no collision. So randomly picking some hash functions from universal hashing should quickly yield a good hash function that does not do any collision for the given keys.

Space:
let x is a random variable denoting total space in level-2, then x = sum(n_i ^ 2), where n_0 + n_1 + ... + n_m = n [n keys and m slots]
E[space] = n + E[x] = O(n)

Reference:
0. MIT OCW 6.046 : Introduction to Algorithms

Search and You will Find!

Pages

Sunday, September 17, 2017

[Algorithms] Perfect Hashing: Guarantying constant time operation for hash function

No comments:

Post a Comment