Construction of Universal Hash function
Let m is a prime.
k is the key and decomposed in m-base number system with (r + 1) digits,
k = <k_0, k_1, k_2, ..., k_r> where 0 <= k_i <= m-1
take a random number a with r digits, such that
a = <a_0, a_1, a_2, ..., a_r> where a_i is chosen randomly and 0 <= a_i <= m-1
a universal random function, h(k) = sum(k_i * a_i) mod m
size of the set of universal hash functions, |H| = (m ^ (r+1)) [each a_i can be chosen from m choices]
Theorem: for h element_of H and for x != y, probability of collision = 1 / m
let x = <x_0, x_1, ..., x_r>, y = <y_0, y_1, ..., y_r> and x != y
x and y differ at least at 1 digit. how many h element_of H collides for x and y?
let, a = <a_0, a_1, ..., a_r> for h. if for h, x and y collides then
h(x) = h(y)
sum(a_i * x_i) (=) sum(a_i * y_i) (mod m)
=> sum(a_i * (x_i - y_i)) (=) 0 (mod m)
=> a_0 * (x_0 - y_0) (=) -(sum(a_i * (x_i - y_i))) (mod m)
=> a_0 (=) -(sum(a_i * (x_i - y_i))) * inverse(x_0 - y_0) (mod m) [Eqn. 0]
the [Eqn. 0] says to collide, we can choose any a_1 to a_r but the choice of a_0 is fixed by [Eqn. 0]. for all other choices of a_0 there will be no collision.
total possible ways to construct H, |H| = m ^ (r+1)
we have 1 option for choice of a_0 and m choices for all other r digits. so, number of collision = 1 * m * m * ... * m = m ^ r = |H| / m
for x != y, probability of collision = (cases when collision occurs) / (total number of cases) = (m ^ r) / (m ^ (r+1)) = 1 / m
Theorem: if there are n keys, then for a key x, E[numboer of collision with x] <= n / m
let C_x be a random variable denoting total number of collisions in keys in Table T with x, and C_xy is the indicator random variable
where, C_xy = 0 if no collision for x and y, and C_xy = 1 if there is collision between x and y.
C_xy = sum(C_xy) for all y in T - {x}
so, E[C_xy] = E(sum(C_xy)) = sum E(C_xy) = sum (1/m) = (n - 1) / m
Summary: Property of Universal Hashing
Let, U is the universe of keys and let H be a finite collection of has functions mapping u to {0, 1, ...., m-1}.
H is universal if for all h element_of H, and for all x and y element_of U, where x != y,
Probability(h(x) == h(y)) = 1 / m.
total number of cases where collision happens, for h and for all pair of different keys = |H| / m
Reference:
[0] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-8-universal-hashing-perfect-hashing/
Let m is a prime.
k is the key and decomposed in m-base number system with (r + 1) digits,
k = <k_0, k_1, k_2, ..., k_r> where 0 <= k_i <= m-1
take a random number a with r digits, such that
a = <a_0, a_1, a_2, ..., a_r> where a_i is chosen randomly and 0 <= a_i <= m-1
a universal random function, h(k) = sum(k_i * a_i) mod m
size of the set of universal hash functions, |H| = (m ^ (r+1)) [each a_i can be chosen from m choices]
Theorem: for h element_of H and for x != y, probability of collision = 1 / m
let x = <x_0, x_1, ..., x_r>, y = <y_0, y_1, ..., y_r> and x != y
x and y differ at least at 1 digit. how many h element_of H collides for x and y?
let, a = <a_0, a_1, ..., a_r> for h. if for h, x and y collides then
h(x) = h(y)
sum(a_i * x_i) (=) sum(a_i * y_i) (mod m)
=> sum(a_i * (x_i - y_i)) (=) 0 (mod m)
=> a_0 * (x_0 - y_0) (=) -(sum(a_i * (x_i - y_i))) (mod m)
=> a_0 (=) -(sum(a_i * (x_i - y_i))) * inverse(x_0 - y_0) (mod m) [Eqn. 0]
the [Eqn. 0] says to collide, we can choose any a_1 to a_r but the choice of a_0 is fixed by [Eqn. 0]. for all other choices of a_0 there will be no collision.
total possible ways to construct H, |H| = m ^ (r+1)
we have 1 option for choice of a_0 and m choices for all other r digits. so, number of collision = 1 * m * m * ... * m = m ^ r = |H| / m
for x != y, probability of collision = (cases when collision occurs) / (total number of cases) = (m ^ r) / (m ^ (r+1)) = 1 / m
Theorem: if there are n keys, then for a key x, E[numboer of collision with x] <= n / m
let C_x be a random variable denoting total number of collisions in keys in Table T with x, and C_xy is the indicator random variable
where, C_xy = 0 if no collision for x and y, and C_xy = 1 if there is collision between x and y.
C_xy = sum(C_xy) for all y in T - {x}
so, E[C_xy] = E(sum(C_xy)) = sum E(C_xy) = sum (1/m) = (n - 1) / m
Summary: Property of Universal Hashing
Let, U is the universe of keys and let H be a finite collection of has functions mapping u to {0, 1, ...., m-1}.
H is universal if for all h element_of H, and for all x and y element_of U, where x != y,
Probability(h(x) == h(y)) = 1 / m.
total number of cases where collision happens, for h and for all pair of different keys = |H| / m
Reference:
[0] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-8-universal-hashing-perfect-hashing/
No comments:
Post a Comment