Friday, September 22, 2017

[Algorithms] Binary Search Tree (BST)


          parent
            / \
          /     \
        left    right

Binary search trees stores information (keys) holding some constraint. Each node holds some value. A left node holds value less than or equal to the parent node and right node holds value greater than parent node.

Motivation:
Interval scheduling problems can be solved efficiently with binary search trees. We will need to traverse left and right to see where a given interval should fit into. If that place is already taken or being overlapped by an existing interval, we discard the interval otherwise we insert it.

BST operations
0. insert : insert a key to the tree if the key does not already exist.
1. search : check if a key is present in the tree or not.
2. delete : delete a key if exist in the tree.

0. Insert
def insert (node, key):
  if node == null:
    return new Node(key)
  else if node.key == key:
    return node
  else if (node.key > key):
    node.left = insert(node.left, key)
  else:
    node.right = insert(node.right, key)

1. Search
def search(node, key):
  if node == null:
    return false
  if node.key == key:
    return true
  else if node.key > key:
    return search(node.left, key)
  else
    return search(node.right, key)

2. delete:
def delete(node, key):
  if node == null:
    return node
  if node.key == key:
    if node.left == null
      return node.right
    else if node.left.right == null
      return node.left
    else
      new_node = get_right_most(node.left)
      delete_right_most(node.left)
      new_node.left = node.left
      new_node.right = node.right
      return new_node
  else if node.key > key:
     node.left = delete(node.left, key)
  else
    node.right =  delete(node.right, key)
  return node

Augmented Binary Tree
When a binary tree holds information along with keys, it is called augmented binary tree. A common augmented BST is storing node counts of a subtree.

We will be discussing about the following binary search trees
0. avl tree
1 red-black tree
2. b-tree

AVL Tree
Self balancing binary trees. Balances herself using "rotate" operation. There can be log n number of rotations.
This tree is an augmented data structure. Every node contains the following information
  height of subtree
  left child
  right child
 
property:
  at any node, let h_left and h_right are left sub tree height and right sub tree height.
    |h_left - h_right| <= 1
  number of nodes, at worst case, N_h = 1 + N_(h-1) + N_(h-2) [because in best case both the left and right will be balanced]
  1 + N_(h-1) + N_(h-2) > 1 + 2 * N_(h-2) > 2 * N_(h-2)
  N_h > 2 * N(h-2)
  so number of nodes N = 2 ^ (h/2) => h = O(logn)

Rotation:
      x           <- left rotate                 y
    /   \         right rotate ->              /   \
   A     y                                    x     C
        / \                                  / \
       B   C                                A   B

in order traversal produces A x B y C for both the trees.

insert:
  simple bst insert
  fix avl property broken at any point of insertion. move up.

while insertion, the following cases can occur where AVL tree properties are broken at node x.
0. Right child is right heavy.

         x (h)          
       /   \      
(h-3) A     y (h-1)                                  
           / \                                
   (h-3)  B   C (h-2)                              

a left rotate against right child, y fixes the properties.

        y (h-1)
       / \
(h-2) x   C (h-2)
     / \
    A   B  [both A and B have h-3 height]

1. Right child is left heavy
                      x (h)          
                    /   \      
            (h-3) A     y (h-1)                                  
                       / \                                
               (h-2)  z   C (h-3)                              
                     / \
              (h-3) B   D (h-4)

              do a right rotate with respect to y

                  x (h)
                /   \
        (h-3)  A     z (h-1)
                    / \
            (h-3)  B   y (h-2)
                      / \
               (h-4) D   C (h-3)

        now we have the same situation as case 0. do as we did in case 0.

Mirror situation occurs for left heaviness.

[Algorithms] String search algorithms

String Matching Algorithms

problem: given two strings, s and t. does s occurs as a substring of t?

naive approach:

def substringSearch(t, s):
  for i in range(0..len(t)):
    matched = true
    for j in range(0..len(s)):
      if t[i+j] != s[j]:
        matched = false
        break
    if matched:
      return "String occurs"
  return "does not occur"

Runtime: O(m * n)

Rabin-Karp Algorithm:

def subStringSearch(t, s):
  len_s = len(s)
  pattern_hash = get_hash_val(s, 0, len_s)
  substr_hash = -1
  for i in range(0..len(t)-len(s)):
    substr_hash = get_hash(t, i, i+len_s, substr_hash)
    if substr_hash == pattern_hash:
      if t[i:i+len_s].matches(s):
        return true
  return false

def get_hash_val(s, start, end, prev_hash=-1):
  size_of_alphabet = SIZE_OF_ASCII_CHARS
  lowest_ascii_char = 'a'
  hash_val = 0
  if prev_hash == -1:
    for i in range(start..end):
      hash_val *= size_of_alphabet
      hash_val += (s[i] - lowest_ascii_char)
  else:
      start_char = s[start]
      hash_val = prev_hash * size_of_alphabet + (s[end] - lowest_ascii_char) - start_char * (size_of_alphabet ^ (end-start+1))

Analysis:
  if prev_hash = -1, hashing takes O(s)
  inside the for loop, hashing happens in O(1) time
  the hash function is designed in such a way that if two hash value equals then they are the same string.
  So for good hash function, O(n)
  for bad hash function O(m * n)

Sunday, September 17, 2017

[Algorithms] Perfect Hashing: Guarantying constant time operation for hash function

For any hash function, it is possible to find a set of keys that will result O(n) time in search. In this article, I will talk about a scenario when we already know the keys, and we want to do the look up the keys in O(1) time even for the worst case.

Perfect Hashing Scheme:
We need two level hashing. In first level, we have n slots in hash table T where n = number of keys.
At level1, we use one hash function h_l1. Each slot of T contains 3 things
[number of items, hash function, pointer to array holding keys at level 2]

for any slot in T, size of key array in level 2 = (number of keys in level 2) ^ 2.

Index     Level-1         Level-2
0         [2 | 40 | ->]   [- | 40 | 37 | 27 ]
1         [1 | 01 | ->]   [- | 3 ]
2         [1 | 23 | ->]   [- | 25 ]
3         [0 | 19 | ->]   [ ]
4         [1 | 44 | ->]   [ 127 ]

at 0 index, level-1 slot has [2 | 40 | ->] this means, level-2 has 2 elements. So level-2 hash table will have size 2 ^ 2 = 4. the value 40 means universal hash function number 40, h_40 will be used to hash in level 2.

In level-1 there will be collision. We need to guarantee that in level-2 we can find hash functions that won't collide for m_i number of keys.

Proof:
x is a random variable that represents number of collisions in level-2 for a slot i. in i, level-2 has m_i number of keys. then hash table has size (m_i) ^ 2 slots. since, a universal hash function will be used probability of any two keys collide would be 1 / (m_i) ^ 2.

E[x] = sum (1/(m_i ^ 2)) for all keys x and y for given set of keys in level-2 at slot i.
we can make n Choose 2 combinations, so
E[x] = (m_i C 2) * (1/(m_i ^ 2)) < 1 / 2
E[x] <= 1/2

from Markov Inequality, P{x >= t} <= E[x] / t
P{x >= 1} <= E[x] / 2 <= 1/2    [0]
Equation [0], basically says, if we use universal hashing, at least half the cases there will be no collision. So randomly picking some hash functions from universal hashing should quickly yield a good hash function that does not do any collision for the given keys.

Space:
let x is a random variable denoting total space in level-2, then x = sum(n_i ^ 2), where n_0 + n_1 + ... + n_m  = n [n keys and m slots]
E[space] = n + E[x] = O(n)

Reference:
0. MIT OCW 6.046 : Introduction to Algorithms

Saturday, September 16, 2017

[Algorithms] Keeping operations on a Hash table constant

In this article, I will discuss how to keep search operation's run time constant in a hash data structure.

Let the table T has m slots. If h is a universal hash function then, for key x and y where x != y
probability(x) = h(y)) = 1 / m
if the number of keys = n, on average every slot will have n / m keys. this is also known as load factor, alpha. So, alpha = n / m.
so each slot will have on average alpha number of keys. So expected number of searches would be O(1 + alpha).
To keep searching constant, we need to keep alpha constant.

As number of keys, n grows alpha starts increasing. If we want to keep alpha ~ 1, we need to increase size of our table. If we can handle
resizing of the table in constant time, we will be able to do operations in hash table in constant time.

Idea is If alpha hits a certain threshold, then we double size of T. This way insertion will be in amortized O(1) time.

Example: Lets say initially T has size m = 1. Insertion has cost O(1) for a hash table as all it does is insert the element at the head.
When we double the table, the cost of insertion goes to O(n) from 1. Fortunately, this does not happen very often, which makes average insertion operation a O(n) time operation. Below I have shown the key number and the order of the operation. When there is only one key, the insertion time is 1. If we add another key, we need to increase size of our table and then copy all the data from old table to new table. This cause O(n) time.

Key#     cost
1             1
2             2
3             1
4             4
5             1
6             1
7             1
8             8
9             1
10           1
11           1
12           1
13           1
14           1
15           1
-----------------
total cost for n insertion  =  (1+1+1..) + (2+4+8) = O(n)  # as the bigger value gets scarce as you keep doubling the table
amortized cost = avg cost per insertion = O(n/n) = O(1)

for n = 1000, the following Haskel code snippet counts average cost of insert operation.
a = sum([2 ^ x | x <- [0..100], 2 ^ x < 1000] ++ take 1000 (repeat 1)) `div` 1000
print a # prints 2

Deletion
Table size = n, number of keys = k. Half the table size when k = n / 4

References:
0. MIT OCW 6.006 Introduction to Algortihms
1. MIT OCW 6.046J Introduction to Algorithms

Thursday, September 14, 2017

Quick Haskel

I have been documenting the  concepts and apis of the programming language Haskel. This document has examples of various constructs of the Haskel programming language. It is a live document. Until I complete the contents of [0], I will be updating this document.

function declaration
f x = x + 2   # f(x) = x + 2

conditional
abs x = if x < 0 (-1) * x else x  # an if should always have an else

list
list contains element of the same type.
let myList = [1,2,3]
strings are considered as lists too. "abc" is equivalent to ['a', 'b', 'c']

append to last [slower operation if the first list is big]
myList ++ [5, 6]

append single element to first
7 : myList
[1, 2, 3] is equivalent to 1:2:3:[]

access list element
myList !! 2     # access 2nd element of the list

list comparison
rule: 2 lists can be compared if the elements can be compared. Nonempty list is greater than empty list. element by element is done from start of the list until a match found or one/both of the lists ends.
operator: <, <=, >=, >, ==
a = [1,2,3], b = [4,5,6]
a < b # False

list of list
a = [[], [1,2,3], [4]]

list operations
[<head><......tail......>]
[<.......init.....><last>]
operations does not have side effect meaning they don't modify the list.
head [1,2,3,4,5] #1
tail [1,2,3,4,5] #[2,3,4,5]
init [1,2,3,4,5] #[1,2,3,4]
last [1,2,3,4,5] #5
length [1,2,3] # 3
null [] # True
null [1] # false
reverse [1,2,3] # [3,2,1]
take 1 [7,2,3] # [7]
drop 1 [7,2,3] #[2,3]
maximum [2, 4, 5, 1] # 5
minimum [2, 3, 1, 5] # 1
sum [1, 2, 3] # 6
product [1, 2, 3, 4] # 24
elem 7 [1, 6, 7] # True, 7 belongs to the list
[1..5] #[1,2,3,4,5]
['a'..'d'] # ['a', 'b', 'c', 'd']
[5,4..1] #[5,4,3,2,1]
cycle[1,2,3] #produces infinite list [1,2,3,1,2,3,1,2,3,...infinite]
repeat 5 # [5,5,5,5,5,5,5....infinite size]
repeat [1] # [[1], [1], [1], [1], [1] ... infinite size]
replicate 4 2 # [2, 2, 2, 2]

list comprehension
[x * 2 | x <- [0..5]]  # all x * 2 s such that x belongs to the set [0, 1, 2, 3, 4, 5]
[x * 2 | x <- [0..5], x > 5] # multiple predicate
evenOdd xs = [ if x `mod` 2 == 0 then 0 else 1 | x <- xs ] # function evenOdd takes a list xs and returns a list of 0s and 1s
usage: evenOdd [0..10] # [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
[x + y | x <- [0..5], y <- [0..3]] # for all x and y in the sets, generates a list that has the value x + y
length' ls = sum [1 | _ <- ls] # generate a list that has 1 for each element of ls, then sum up all the 1s

Tuple
stores heterogenous typed elements. size is fixed.
(1, 1.1, 'a', "saif")
tuple types are defined by the number of elements in it. a list can hold only same typed elements. thus,
[(1,2), (2, 3), (1,2,3)] causes ERROR
[(1, 2), ('a', 1)] also causes ERROR

Pairs
(1, 3)
fst (1,3) # 1
snd (1,3) # 3
zip ['a', 'b', 'c'] ['x', 'y', 'z'] # [('a','x'),('b','y'),('c','z')]
zip "abc" [1..] # [('a',1),('b',2),('c',3)] notice how only 3 elements are taken from the infinite set

## Type ##
Type is fixed.
ghci command :t shows type of a variable or function.
function type
sum :: Int -> Int -> Int
sum a b =  a + b

common types
Int : bounded by underlying system
Integer : unbounded
Float : real number with single precision
Double : real number with double precision
Bool : boolean type
Char : a Unicode character

Type Class
A type class encloses one or more types.
A type can be member of one more type classes.

ghci command. :t (==)
(==) :: (Eq a) => a -> a -> Bool
How to read. (==) takes 2 argument of type class Eq and returns a Bool type.

Eq
applicable function: ==, /= (not equal)

Ord
values of Ord type can be sorted.
functions to apply : >, >=, <, <=.
comparator function returns GT, LT or EQ.

Show
the value can be presented as a string.
function to apply on this type: show # show 3 => "3"

Read
from string, raw type can be formed
applicable function: read
ex: add (read "3") + 3  # 6
read function need enough hint to know which raw type to produce. for example,
this does not work, read "3"
but read "3" + 3 works, because read knows "3" to be converted to Int
the following example works similar to type casting (according to me)
read "3" :: Int # specifically specify we need 3 to be Int.

more examples:
read "[1, 2, 3]" :: [Int]  # [1,2,3]
read "(2, 'a')" :: (Int , Char) # (2, 'a')
[read "True", True, False, True]

Enum
sequentially enumerable values.
applicable functions: succ, pred

Bounded
has an upper and lower bound

Num
values act like numbers

Floating
applicable functions: sin cos
Enclosing type: Float, Double

Integral
enclosing type: Int, Integer

## Syntax in Functions ##
Pattern matching: choosing a function based on input parameters. Pattern matching works very similar to if else construct.
citizen :: String -> String
citizen "BD" = "Bangladeshi"
citizen "USA" = "American"
citizen x = "Unknown citizen"

line 0, is function input and output type declaration. line 1, says what to do if input = "BD", so on and so forth.

can be used to define base case.
fibonacci :: Int -> Int
fibonacci 0 = 0
fibonacci 1 = 1
fibonacci n = fibonacci(n-1) + fibonacci(n-2)

If the pattern matching does not contain all the possibilities of input, interpreter will throw an exception.

vector_add :: (Double, Double) -> (Double, Double) -> (Double, Double)
vector_add a b = (fst a + fst b, snd a + snd b)
another way to write
vector_add (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)

Usage of '_'
fst' :: (a, b, c) -> a
fst' (a, _, _) = a

head' :: [a] -> a
head' (a:_) = a

As pattern break up input in parts, also gives a way to access the whole. can be applied on list only.
shead :: [Int] -> Int
shead whole@(frst : rest) = frst

Guards
bmiTell :: Double -> String
bmiTell bmi
  | bmi <= 18.5 = "underweight"
  | bmi <= 25.0 = "normal wieght"
  | bmi <= 30.0 = "fat"
  | otherwise = "unhealthy fat"

guard syntax | boolean-expression = expression-to-be-evaluated

where
clause. provides variable like functionality of imperative programming language.

bmiTell w h
  | bmi <= skinny = "you are fine"
  | bmi <= fat = "you are not fine"
  where bmi = w / h ^ 2
        skinny = 18.5
        fat = 23.5

forming tuple and accessing it
initials :: String -> String -> String
initials first last = [f] ++ "." ++ [l] ++ "."
  where (f:_) = first
        (l:_) = last
where to pattern match
desList :: [a] -> String
desList ls = "the list is " ++ check ls
        where check [] = "empty"
              check [x] = "singleton"
              check xs = "a longer list"

Let
let lets you create an expression that generates a value. let are local in scope.
cylinder :: Double -> Double -> Double
cylinder r h =
  let sideArea = 2 * pi * r * h
      topArea = pi * r ^ 2
  in sideArea + 2 * topArea

pattern: let <variable binding> in <expression>
let square x = x * x in (square 2, square 3, square 4)

list comprehension
calcBMI xs = [bmi | (w, h) <- xs, let bmi = w / h ^ 2, bmi > 25.0]
the value bound through let is visible before '|' and after the let

Case expression
syntax:
case expression of pattern -> result
                   pattern -> result
                   pattern -> result
                   pattern -> result
example:
sum' :: [Int] -> Int
sum' xs = case xs of [] -> error "empty list"
                     _ -> sum xs

case can be used inside a function too.
describeList ::[a] -> String
describeList ls = "The list is " ++ case ls of [] -> "empty"
                                               [x] -> "singleton"
                                               xs -> "a longer list"

Reference:
[0] http://learnyouahaskell.com/

Tuesday, September 12, 2017

Construction and properties of Universal Hash function

Construction of Universal Hash function
Let m is a prime.
k is the key and decomposed in m-base number system with (r + 1) digits,
k = <k_0, k_1, k_2, ..., k_r> where 0 <= k_i <= m-1
take a random number a with r digits, such that
a = <a_0, a_1, a_2, ..., a_r> where a_i is chosen randomly and 0 <= a_i <= m-1

a universal random function, h(k) = sum(k_i * a_i) mod m

size of the set of universal hash functions, |H| = (m ^ (r+1)) [each a_i can be chosen from m choices]

Theorem: for h element_of H and for x != y, probability of collision = 1 / m
let x = <x_0, x_1, ..., x_r>, y = <y_0, y_1, ..., y_r> and x != y
x and y differ at least at 1 digit. how many h element_of H collides for x and y?
let, a = <a_0, a_1, ..., a_r> for h. if for h, x and y collides then
h(x) = h(y)
sum(a_i * x_i) (=) sum(a_i * y_i)    (mod m)
=> sum(a_i * (x_i - y_i)) (=) 0 (mod m)
=> a_0 * (x_0 - y_0) (=) -(sum(a_i * (x_i - y_i)))      (mod m)
=> a_0 (=) -(sum(a_i * (x_i - y_i))) * inverse(x_0 - y_0)     (mod m)     [Eqn. 0]

the [Eqn. 0] says to collide, we can choose any a_1 to a_r but the choice of a_0 is fixed by [Eqn. 0]. for all other choices of a_0 there will be no collision.
total possible ways to construct H, |H| = m ^ (r+1)
we have 1 option for choice of a_0 and m choices for all other r digits. so, number of collision = 1 * m * m * ... * m = m ^ r = |H| / m
for x != y, probability of collision = (cases when collision occurs) / (total number of cases) = (m ^ r) / (m ^ (r+1)) = 1 / m

Theorem: if there are n keys, then for a key x, E[numboer of collision with x] <= n / m
let C_x be a random variable denoting total number of collisions in keys in Table T with x, and C_xy is the indicator random variable
where, C_xy = 0 if no collision for x and y, and C_xy = 1 if there is collision between x and y.

C_xy = sum(C_xy) for all y in T - {x}
so, E[C_xy] = E(sum(C_xy)) = sum E(C_xy) = sum (1/m) = (n - 1) / m

Summary: Property of Universal Hashing
Let, U is the universe of keys and let H be a finite collection of has functions mapping u to {0, 1, ...., m-1}.

H is universal if for all h element_of H, and for all x and y element_of U, where x != y,
Probability(h(x) == h(y)) = 1 / m.

total number of cases where collision happens, for h and for all pair of different keys = |H| / m

Reference:
[0] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-8-universal-hashing-perfect-hashing/


Sunday, September 10, 2017

Good Hash Functions

Let m = number of slot in the hash table. k is the key we want to hash. We will be discussing some hashing mechanisms.

Division method
A well-known hash method is doing mod
h(k) = k mod m
We need to be careful with choice of m.

m should not have small divisor. For example, if d = 2 is a divisor of m, and the keys are all even,
then the hash function will always choose even slots and all the odd slots will be empty.

we want the h(k) depends on all the bits of k. m should not be a power of 2.
lets say m = 2 ^ k. then the hash value will not depend on all the bits of the key, k.

 001 100 111 mod 2 ^ 3 = 111 (the last 3 bits)
 in general k mod (2 ^ t) = t lower bits of k

pick m, such that it's a prime and not close to power of 2 or 10.

Division method is simple, but division takes a lot of cycles.

Multiplication method
Lets assume the word size of the computer = w.
let, m = 2 ^ r
h(k) = ((A * k) mod (2 ^ w)) >> (w - r), where A is odd, 2 ^ (w-1) < A < 2 ^ (w). A is not close to power of 2.

How it works:
A has w bits, k has w bits. A * k has 2w bits.
A * k mod (2 ^ w) has lower w bits of A * k.
((A * k) mod (2 ^ w)) >> (w - r) gives r bits slot result.

When multiplication is done, there is great mixing of bits of k.
m = 2 ^ 3 = 1000
k = 1001
A = 5 = 101
w = 4
A * k
                        1001
                        0101
-------------------------------
                        1001
                      00000
                    100100
                  0000000
--------------------------------
                 00101101 (hash result will be 110)

notice how the bits in the middle are nicely mixed with many other bits. this is one of the reasons the multiplication method works nicely.
the mod and shift operations are very fast, as it only discards some digits. that is one of the reasons it is faster than the division method.

Reference:
[0] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/
[1] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-7-hashing-hash-functions/

[Algorithms] Why operation of a HashMap is considered constant

Assume we will be hashing by chaining.
Number of slots = m. Number of keys = n
Assumption is hash function chooses any slots with equal probability, so probability of a key will have collision = 1 / m.
Because of the above property of the hash function, all the slots will be expected to have same number of keys, avg_keys_per_slot = n / m. This is also knowns as load factor alpha.
Then expected number of searches where the key is not present is O(1 + alpha).
Expected search becomes constant when alpha = O(1), or n = O(m).

Saturday, September 9, 2017

[Algorithms] Find the best pivot for quicksort or quickselect algorithm

Problem with quicksort is that in worst case an input array is not divided in equal parts,
T(n) = T(n-1) + O(n), that leads to the order of O(n^2). Following is the algorithm that would find the best pivot in linear time and guarantees the best split. Then recurrence becomes,
T(n) = 2T(n/2) + O(n) which leads to order of O(n logn).

Idea is divide an input array A in 5 rows. Then each column will have atmost 5 elements. To sort one column with constant elements takes constant time. Sorting all the columns would take O(n) time. Then take median of all the columns and form an array. Recursively find the median of that array. Choose the median element as pivot.

Algorithm:
  def select_pivot(A):
    if len(A) <=5:
      sort(A)
      return A[len(A)/2]
   
    bucket = [[]] * 5
    #divide the elements into 5 buckets
    for i in range(0, A.length()):
      buckets[i % 5].append(A[i])

    next_A = []
    columns = A.length() / 5
    for col in range(0, columns):
      a = []
      for row in range(0, 5):    # extract elements of a column
        a.append(bucket[row][col])
      sort(a)
      next_A.append(a[2]) #take median
      return select_pivot(next_A)


Analysis:
Dividing the elements in 5 buckets takes O(n) time.
Sorting a single column of 5 elements takes O(1) time. Sorting n/5 columns takes O(n) time.
Next recursion takes n/5 number of elements. So order, T(n/5)

T(n) = T(n/5) + O(n) = O(n)

Following the above algorithm we can guarantee, even in the worst case, we will have a pivot that will split input array in 2 equal parts. Once we know the pivot element, we can find the index of the pivot element from quicksort procedure in O(n) time.

Friday, September 8, 2017

Sorting algorithms

O(n ^ 2) runtime

0. Bubble Sort
1. Insertion Sort

O(n logn) runtime
0. merge sort
1. quick sort
2. heap sort

O(n) run time

0. counting sort
only work for integers. we will need to know the largest integer, L. sorting is done O(n). Create an array with size equal to maximum number of the sequence to be sorted. The traverse through the sequence and count the occurrence of each number. Finally, fill the initial array with the count.
Algorithm:
def count_sort(A):
  L = MAX_INTGER_IN_LIST
  count = [0] * L

  for num in A:
    count[num]++

  index = 0

  for i in range(0, A.len):
    c = count[i]
    for k in range(index, index+num):
      A[k] = i
  return A

Order: O(num_elements + max_integer). this comes from the two for loops.

1. radix sort
k = log_10(MAX_NUM_IN_SEQUENCE) + 1 gives the number of digit required to represent a number in 10 based system.

def radix_sort(A):
    max_digit = log(MAX_INTEGER, 10) + 1
    #create 10 buckets

    for dig_index in range(0, max_digit):
      bucket = [[]] * 10
      divisor = pow(10, dig_index)
      for number in A:
        n = number / divisor
        digit = n % 10
        bucket[digit].append(number)
      A.clear()
      for list in bucket:
        for num in list:
          A.append(num)
    return A

This algorithm is stable if it maintains order of input elements. The radix sort maintains the order of previous elements, that's why it works.

123, 153, 117
first round: 3, 3, 7
123
153  //order maintained
117

second round: 2, 5, 1
117
123
153

third round: 1, 1, 1
order maintained.

The last round shows an example of stability. Because all the digits are the same, the algorithm does not alter any number in the sequence.

Wednesday, September 6, 2017

Lower bounds on comparison is n logn

Problem: A = {a, b, c, d, ... n elements}
find a permutation of the element of A, {a', b', c', ... } where a' <  b' < c' so on and so forth.

how many comparisons we need to make at least? the answer will be the lower bound of the comparison-based sort.

Analysis:
We can think of a binary tree, each comparing two elements of A. At the leaf, we will reach one of the permutations.

At the leaf we will have all n! permutations of elements of A. Lets height of the tree is h. So, h will minimum number of comparisions we need to sort A.

for binary trees,
2 ^ h >= number of leaves
2 ^ h >= n!
h >= log n!
h >= log ((n/e) ^ n) #sterling formulat
h >= n log(n/e)
h ~ O(n logn)

A sample decision tree for 3 elements can be seen in the following figure.

Sunday, September 3, 2017

[security-0] Steal and Pass information from a secured system using Covert Channel

Imagine a system where sensitive data exists. The Operating System(OS) reinforces the security by making sure only processes with higher privileges can read those information. To prevent a higher privileged process pass the information to outside world, the OS makes sure a higher privileged process cannot access internet or cannot write data for a lower privileged process. A lower privileged process cannot read sensitive data and it can access internet if it needs to. Authorized employees use a higher process data when they want to do something critical such money transaction etc. Usual employees use lower privileged processes to check gmail, Instagram, Facebook, YouTube etc websites.

To strengthen security, all the data are encrypted with a private key.

Modern Operating Systems prohibits write-down and read-up. This means a process with higher privilege can not write information for a process with lower priory and a process with lower priority cannot read data which is meant for a higher-priority process. If a malicious process through some means, gain access to run as a higher privileged process, it will try to pass sensitive information to lower priority process which can in turn pass the information to outside. Because write down is prohibited, the higher privileged process cannot pass the information to the lower privileged process. One fact about the host OS is that if a file exists in a system with higher privilege, if a lower privileged program tries to read it, the OS will notify the lower privileged program by saying that the process cannot read the file because it does not have enough privilege to do so. If the file does not exist, the OS will notify that the file does not exist. This behavior of the OS can be exploited to pass sensitive information.

We will call the higher-privileged malicious program high_process and lower-privileged malicious process low_process. Imagine the higher privileged program gets access to private key of the system. It is known that the system has 4 bit keys. The specific secret key for the system is "1001". Here is how the 4 bits can be passed.

0. high_process creates a file demon.txt in root directory.
1. low_process at some specified time, finds existence of demon.txt file, low_process now knows that the high_process has access to the secret key.
2. On a specified time, lets say 10 pm, the high_process creates a file evil.txt in the root direcotry.
3. At 10:00:01 pm, low_process tries to read evil.txt and OS says the file exists but it cannot read it becaue of privilege issue. The low_process knows the first bit is 1.
4. The second key bit is 0, so after 30 seconds, the high_process deletes the evil.txt file
5. 31 seconds later the low_process tries to read evil.txt and finds the OS saying The file does not exists. The low_process knows the second bit of the key is 0.
This keeps going and soon enough, the low_process has access to the private key of the user and it can safely pass it to outside.

Reference:
0. https://www.amazon.com/Network-Security-Private-Communication-Public/dp/0130460192

[living-0] Street smartness: Rules of thumb when buying a technology book


We have a saying in Bangla(my native language), "No one gets bankrupt for buying books". I buy a lot books. Many of the times I have found that I am paying 10-15$ more for a book and that is not good. Many of the times I have found that some the books I bought became a great teacher and companion. Other times I have found that I don't enjoy reading the book as much. Here is what I have learnt from trial and errors regarding buying books. 

I buy if
  1. The technology you are willing to learn is time tested, definitely buy a good book. Examples can be algorithms, cryptography, mathematics, compilers and physics. 
  2. in Quora, you hear a lot of good things about the book, in amazon book review you see lot of positive review that is an indication of that book being a good one.
  3. the book is about a person you are greatly interested. For me I like Buddha, Richard Feynman, Donald Knuth, Randy Pausch (and so many other of my heroes) very much. I would not hesitate to buy a book by them. (For Buddha I would get a book that has His quote)
Before I buy
  1. Verify the book's price in various book selling websites. I used to buy books only from amazon. But later I learnt about other online book sellers such as abebooks, thriftbook, alibris. I won't buy from a website that asks for my bank information and does not accept credit cards. I would rather prefer buying from a website that accepts paypal payment or at least accept credit card.
I don't buy if
  1. If there is an active website that maintains that technology. An active website serving purpose of documentation for that technology is alive meaning you will get lightning fast error correction, update and improved explanations over there. Example, Hadoop, HBase, Kafka, Git etc. As soon as that technology has an update, your book might get obsolete. 


[Algorithms-0] Divide and Conquer more examples

Situation:
There is a problem P, input size n, solve it. (I am being silly here :)

Requirement:
The problem P should divisible in pieces that is significantly smaller than the original problem. This technique works fine if the divided pieces are close like n/2.
For example, to find n th fibonacci number, we can use the following recursive function
f_n = f_n-1 + f_n-2
We are dividing the original problem into size of n-1 and n-2. This ends up being an exponential algorithm.

If we make too many pieces divide and conquer might not work as well. Lets take example of multiplication of to n x n matrices, A and B. Naive algorithm would be of O(n ^ 3).  We can divide the the A in 4 n/2 x n/2 matrices, and cand do the same with the other matrix B.
So, A = | a b |  and B = | e f | result, C = A x B = | i  j |
| c d |          | g h | | k  l |
where, a, b, c, ... are all n/2 x n/2 matrices

i = a * e + b * g
j = a * f + b * h
so on and so forth.
to compute i, we need to n/2 x n/2 matrix multiplication and then addition. There are O(n ^ 2) elements in the matrix so addition would be of O(n ^ 2). So to compute, i, the order would be 2 T(n/2) + O(n ^ 2)
That means the order of C's computation would be T(n) = 8 T(n/2) + O(n^2). Using master method this ends up being T(n) = O(n^3). The problem here is we divided in too many parts(8) and it ended up doing no good. We still have O(n^3).

Steps
Divide and conquer has 3 steps.
0. Divide : Break the input in parts.
1. Conquer : Compute on each divided parts and find result for those smaller parts.
2. Combine : Find solution for the bigger problem by combining the smaller part's solutions.

Examples
0. Binary search
Problem: a sequence of numbers A = a, b, c, d, .... where a < b < c < d ..... Check if an element x is present in the sequence
Divide:
Divide A in two sequences B and C such that size of B and C is n/2.
Conquer: If x == A[n/2] then return true. If x < A[n/2], then look in B else look in C
Combine: Nothing to combine

1. Peak finding
In a sequence, a, p, b, an element p is peak iff p >= a and p >= b.
Problem: given a sequence of numbers A = a, b, c, d, ... The elements of A are not sorted. Find a peak in this sequence.
Divide: Divide the sequence A in two equal sequences, B and C.
Conquer: if A[n/2] is a peak then return. Else, Take the Part that has the element that did not let A[n/2] to become a peak.
Combine: nothing to combine.

2D version
A is a n x n matrix.
Divide: Divide A in two n x (n / 2) matrices
Conquer: take mid = A[n/2] which is n x 1 matrix. With linear search find the global maxima of mid.
if it is a peak, return.
Else take the half of A that did not let global max of mid to become a peak.


2. Merge sort
Problem: an array A = [a, b, c, d, ... n elements]. elements are not sorted. Find a permuation of elements of A, [a', b', c', ...] such that a' < b' < c' < .. so on and so forth.
Divide : Divide A into two halves B and C where B and C both has n/2 elements.
Conquer :
Combine : Merge two sorted Array B' and C' and return the result.

3. Finding power x ^ n
problem: given x and n find x * x * x ..... * x (n times)
naive algorithm: O(n)

Divide:
x ^ n = x ^ (n/2) * x ^ (n/2)
Divide computation to compute x ^ (n/2)
Conquer:
Combine: we have result of  a = x ^ (n/2). multiply the result with it self (i.e a * a) and return the result

T(n) = T(n/2) + 1 = O(log n)
code:
def comute_power(x, n):
if(n == 0):
return 1
if(n % 2 == 0):
a = compute(x, n/2)
return a * a
else:
a = compute(x, (n-1)/2)
return a * a * x

4. Finding nth Fibonacci number
The naive algorithm to find fibonacci number at the beginning of this article is an exponential algorithm. It is possible to compute fibonacci number in O(n) using dynamic programming. We will discuss here how we can find nth fibonacci number in O(log n) time. nth fibonacci number can be found using obtaining nth power of a special 2 x 2 matrix.

| f_n+1 f_n    |       | 1  1 |
|                     |   = |         |  to the power n (theorem 1)
| f_n     f_n-1 |       | 1  0 |

exponentiation can be done in (log n) time for a constant sized matrix. theorem 1 can be proved easily using induction.

5. Matrix multiplication
In the beginning of this article we saw how dividing a problem in too many sub problems we were unable to find a better solution. Strassen developed an algorithm[2] that would to 7 matrix multiplication instead of 8. then the T(n) = 7 * T(n/2) + O(n ^ 2) ~ )(n ^ 2.83)

6. VLSI layout
connect n grid without overlapping wires. make sure the area of the circuit board is as small as possible. Figure 1 shows connection of the grids in a binary tree fashion. Lets say height = h and width = w. from the figure,
h(n) = h(n/2) + 1 = O(logn)
w(n) = 2 * w(n/2) + 1 = O(n)
area = h * w = O(n * logn)
we can do better. The smallest we can make the area is n because we need to put those n grids anyway.
if we can make h~O(sqrt(n)) and w~O(sqrt(n)) then area would be O(n)
from master method we know O(sqrt(n)) = 2 T(n/4) + O(n ^ (1/2 - epsilon)). We need to divide the n grids in n/4 sized groups as shown in figure 2. Then,
h(n) = 2 * h(n/4) + 1 = O(sqrt (n))
w(n) = 2 * w(n/4) + 1 = O(sqrt(n))
area = O(sqrt(n) * sqrt(n)) = O(n)

7. Quicksort
Divide: divide array A in two parts around a pivot.
A = |         <= pivot            | pivot |          > pivot |
Conquer: recursively divide
combine: nothing to do

algorithm:
def partition(A, p, q):  #A is array, p start and q is end index. returns position of pivot
pivot = A[p] # chooose first element as pivot
pivot_index = p # all element from (pivot_index - 1) to -infinity are less than or equal to pivot

for i in range(p+1, q):
if A[i] <= pivot: # element less than or equal to pivot, we need to put it on left side
A[i], A[pivot_index] = A[pivot_index], A[i]
pivot_index += 1

A[p], A[pivot_index-1] = A[pivot_index-1], A[p] #swap
return pivot_index-1


def quicksort(A, p, q):
if p <= q:
return A
pivot_index = partition(A, p, q)
quicksort(A, p, pivot_index-1)
quicksort(A, pivot_index+1, q)


A = [0, 2, 1, -1]
quicksort(A, 0, 4)

Analysis: worst case, T(n) = T(n-1) + n = O(n ^ 2)
Avgcase T(n) = T(n/d) + T((n * (d-1)) / d) + O(n) = O(n logn)

If pivot is randomly chosen, Worst case, T(n) = O(n logn)

8. Order statistics
Problem: There is an array A, find the kth order element of the array
if k = 1, smallest element
if k = n, max element

Divide:
def partition(A, p, q):  # p start index, q end index
  pivot = A[p]
  i = p + 1
  for k in range(i, q):
    if A[k] <= pivot:
      A[i], A[k] = A[k], A[i]
      i += 1
  i -= 1
  A[p], A[i] = A[i], A[p]
  return i

def search(A, p, q, k): # kth order element to find
  if p >= q:
    if p == k:
      return A[p]
  pivot_index = partition(A, p, q)
  if pivot_index == k:
    return A[k]
  if pivot_index > k:
    return search(A, p, pivot_index - 1, k)
  else
    return search(A, pivot_index + 1, q, k)

On average case, T(n) = T(n/2) + O(n) = O(n)
If pivot is randomly chosen, the above algorithm is O(n) even in worst case.


There is another example of computing square root can be found here http://www.saifulabu.me/2017/08/benefit-of-divide-and-conquer.html

Reference:

0. MIT OCW 6.006 introduction to algorithms
1. MIT OCW 6.046j lecture on divide and conquer
2. http://www.geeksforgeeks.org/strassens-matrix-multiplication/