Friday, September 22, 2017

[Algorithms] String search algorithms

String Matching Algorithms

problem: given two strings, s and t. does s occurs as a substring of t?

naive approach:

def substringSearch(t, s):
  for i in range(0..len(t)):
    matched = true
    for j in range(0..len(s)):
      if t[i+j] != s[j]:
        matched = false
        break
    if matched:
      return "String occurs"
  return "does not occur"

Runtime: O(m * n)

Rabin-Karp Algorithm:

def subStringSearch(t, s):
  len_s = len(s)
  pattern_hash = get_hash_val(s, 0, len_s)
  substr_hash = -1
  for i in range(0..len(t)-len(s)):
    substr_hash = get_hash(t, i, i+len_s, substr_hash)
    if substr_hash == pattern_hash:
      if t[i:i+len_s].matches(s):
        return true
  return false

def get_hash_val(s, start, end, prev_hash=-1):
  size_of_alphabet = SIZE_OF_ASCII_CHARS
  lowest_ascii_char = 'a'
  hash_val = 0
  if prev_hash == -1:
    for i in range(start..end):
      hash_val *= size_of_alphabet
      hash_val += (s[i] - lowest_ascii_char)
  else:
      start_char = s[start]
      hash_val = prev_hash * size_of_alphabet + (s[end] - lowest_ascii_char) - start_char * (size_of_alphabet ^ (end-start+1))

Analysis:
  if prev_hash = -1, hashing takes O(s)
  inside the for loop, hashing happens in O(1) time
  the hash function is designed in such a way that if two hash value equals then they are the same string.
  So for good hash function, O(n)
  for bad hash function O(m * n)

No comments:

Post a Comment