Monday, October 8, 2018

Regular Expression Note in Ruby

What does regular expression do
Given a string, and a pattern, find if the string contains the pattern as a sub-string.

Ruby's syntax
/cat/ => matches string "cat"
/t a b/ => matches "hit a ball"
#matching
/cat/ =~ "cat and dog"

Changing Strings
"my name is saif".sub(/saif/, "Joseph")

gsub replaces all the matches
gsub! affects original string

Regex options
i => case insensitive
o => subsititue once
m => multiple line
x => allow space

Special characters
. + \ ^  * $  ? | ( ) { } [ ]

Anchors
^ : start of a line. /^abc/ => matches "abc def ghe"
$ : end of a line
\A : matches begining of a string
\z : matches end of a string
\Z: matches end of a string unless ends with \n
\b : word boundary
\B: non-word boundary

character class
/[aeiou]/ => matches a vowel character
/[0-9]/ => matches any digit between 0 through 9
/[^0-9]/ => matches anything other than a digit

Ruby option provide
/(?d)\w+/ => a is default character set support
/(?u)\w+/ => matches full unicode characters
/(?a)\w+/  => matches ascii characterset

Posix character class
[[:digit:]] => matches a digit
[[:^digit:]]] => anything except a digit
/\p{Alnum}/ => match a alpha numberic unicode character
period outside [] represents any character
/c.s/ => matches cos

Repetition
r* => zero or more occurrences of r
r+ => one or more
r? => zero or one
r{m,n} =>at least m, at most n
r{m,} => at least m
r{,n} => at most n
r{m} => exactly m

greedy repetition reg_exp+
lazy repettion reg_exp+?
possesive repettion reg_exp++


Alternation
'|' has very low precedence
/d|e/ => matches d or e

grouping
/red (ball|angry) sky/
$1 accesses the first group from outside of the regular expression, \1 from inside the regular expression.

named groups
/(?<first>w+)\k<first>/

lookahead (?=reg_ex)  negated version (?!re_ex)
str = "red, white, and blue"
str.scan(/[a-z]+(?=,)/) # => ["red", "white"]

look behind (?<=reg_exp) negated version (?<!re)

Controlling back tracking
inhibit backtracking: (?>reg_exp)
/((?>X+))(?!O)/ => matches XXXY but not XXO
possesive repetition can be used as well
/(X++)(?!O)/

Back reference to named groups
declaration: (?<name>reg_exp)
reference: \g<name>, \g can be used recursively
re =
/
  \A
    (?<brace_expr>
      {
         (
            [^{}]
           |
            \g<brace_expr>
         )*
      }
    )
/x

conditional group
declaration (?<name>reg_exp)
usage (?<name>...)

alternative in conditions
(?(group_id) true_pattern | false_pattern)

/(?:(red)|blue) ball and (?(1)blue|red) bucket/x  # blue ball and red bucket matches

named sub-routine
sentence = %r{
  (?<subject> cat | dog ) {0}
  (?<verb> eats | drinks ) {0}
 The \g<subject> \g<verb>
}

No comments:

Post a Comment