=========================================================================== CSC 236 Lecture Summary for Week 10 Winter 2008 =========================================================================== Examples of regular expressions: - L(a+b) = {a,b} - L(ab) = {ab} - L((a+b)a) = {aa,ba} = L(aa+ba) - L(a*) = {\epsilon,a,aa,aaa,...} (zero or more repetitions of "a") - L(aa*) = {a,aa,aaa,...} = L(a*a) (one or more repetitions of "a") - L((ab)*) = {\epsilon,ab,abab,ababab,...} (zero or more repetitions of "ab") - L((a+b)*) = {\epsilon,a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,...} (zero or more repetitions of a's or b's, i.e., every string of a's and b's) - L(a*+b*) = {\epsilon,a,b,aa,bb,aaa,bbb,aaaa,bbbb,...} (every string consisting entirely of a's or entirely of b's) - L((a+b)(a+b)*) = {a,b,aa,ab,ba,bb,...} (every nonempty string of a's and b's) - L(a(ba+c)*) = {a,aba,ac,ababa,abac,acba,acc,...} - All strings of a's and b's that have the same first and last symbol: \epsilon + a + b + a(a+b)*a + b(a+b)*b - L = {all strings of a's and b's that contain at least one a}: (a+b)*a(a+b)* or b*a(a+b)* or (a+b)*ab* Regexps R and S are equivalent (denoted R == S) iff they represent the same language (L(R) = L(S)), e.g., b*a(a+b)* == (a+b)*ab*. General regexp equivalences: - R+S == S+R [commutativity] - (R+S)+T == R+(S+T) [associativity] - (RS)T == R(ST) [associativity] - R(S+T) == RS+RT [left distributivity] - (S+T)R == SR+TR [right distributivity] - R+{} == R [identity] - eR == R == Re [identity] - {}R == {} == R{} [annihilator] - R** == R* [idempotence] Example: We prove that L(b*a(a+b)*) = L = {all strings of a's and b's that contain at least one a}, by showing double inclusion (standard technique for proving set equality). . L(b*a(a+b)*) subset of L: Let s be an arbitrary string in L(b*a(a+b)*). This means that s=t.u.v for some strings t (- L(b*), u (- L(a), and v (- L((a+b)*). Since there is only one string "a" (- L(a), u=a so s=t.a.v and s is a string that contains at least one "a", so s (- L. . L subset of L(b*a(a+b)*): Let s be an arbitrary string in L. This means that s contains at least one "a", so it contains a first occurrence of a and can be broken up into three substrings: s=r.a.t, where r is some string that contains no a (maybe empty), a is the first occurrence of a in s, and t is some string of a's and b's. But then, r (- L(b*), a (- L(a), and t (- L((a+b)*) so by definition, s=r.a.t is in L(b*a(a+b)*). --------------------- Finite state automata --------------------- - Simple models of computing devices used to analyze strings. A FSA has a fixed, finite set of "states", one of which is the "initial state" and some of which are "accepting" (or "final") states, as well as "transitions" from one state to another for each possible symbol of a string. The FSA starts in its initial state and processes a string one symbol at a time from left to right: for each symbol processed, the FSA switches states based on the latest input symbol and its current state, as specified by the transitions. Once the entire string has been processed, the FSA will either be in an "accepting" state (in which case the string is "accepted") or not (in which case the string is "rejected"). - Example: Simplified control mechanism for a vending machine that accepts only nickels (5c), dimes (10c) and quarters (25c), where everything costs exactly 30c and no change is ever given. Alphabet \Sigma = {n,d,q} (for "nickel", "dime", "quarter"), set of states = {0,5,10,15,20,25,30} (for amount of money put in so far; no need to keep track of excess since no change will be provided), and transitions are defined by following table (state across the top, input symbol down the side), with the initial state being "0" and the only accepting state being "30" -- note that "30" represents amounts >= 30: | 0 5 10 15 20 25 30 ---+---------------------- n | 5 10 15 20 25 30 30 d | 10 15 20 25 30 30 30 q | 25 30 30 30 30 30 30 (How to read this table: current state specifies column, current input symbol specifies row, entry at that row and column is next state; e.g., if current state is 15 and input symbol is d, next state is entry at row "d" and column "15", which is 25.) Computation of FSA on input such as "dndd" proceeds as follows: state 0 -> process 'd' -> state 10 -> process 'n' -> state 15 -> process 'd' -> state 25 -> process 'd' -> state 30. Since the last state is accepting, the string "dndd" is accepted. - Formal definition: A FSA is a quintuple (Q,\Sigma,q_0,F,\delta) where Q is a fixed, finite, non-empty set of states \Sigma is a fixed (finite, non-empty) alphabet (Q n \Sigma = {}) q_0 (- Q is the initial state F (_ Q is the set of accepting ("final") states \delta : Q x \Sigma -> Q is a transition function (for each q (- Q, a (- \Sigma, \delta(q,a) is the next state of the FSA when processing symbol a from state q) - Transition function gives new state for each state and single input symbol. Extended transition function \delta*(q,w) gives new state for FSA after processing string w starting from state q. It can be defined recursively, as follows: { q if w = \epsilon (empty), \delta*(q,w) = { { \delta(\delta*(q,w'),a) if w = w'a for some { w' (- \Sigma* and a (- \Sigma. Example (from before): \delta*(5,ndn) = \delta(\delta*(5,nd),n) = \delta(\delta(\delta*(5,n),d),n) = \delta(\delta(\delta(\delta*(5,\epsilon),n),d),n) = \delta(\delta(\delta(5,n),d),n) = \delta(\delta(10,d),n) = \delta(20,n) = 25 - A string w is "accepted" by a FSA A iff \delta*(q_0,w) (- F; otherwise, w is "rejected". The language accepted by a FSA A is defined as L(A) = { w (- \Sigma* : A accepts w (i.e., \delta*(q_0,w) (- F) }. - Example: Come up with FSA that accepts L = { w in {a,b}* : w contains an even number of a's }. Use states that represent information about string processed so far. In this case, only need to remember if number of a's seen so far is even or odd, so only need two states "even" and "odd". Initial state should be "even" (since before reading any symbol, number of a's processed so far = 0 is even) and set of accepting states is simply {"even"}. To represent transition function, transition diagrams are a useful notation. Each state represented by a node (labelled with state), transitions represented by directed edges labelled with input symbol (i.e., \delta(q,a) = q' represented by edge from q to q' labelled with a). Initial state has "dangling" in-edge, accepting states have double circles for nodes (in ASCII pictures, accepting states surrounded with parentheses). a _ -----> |/ \ --> (even) odd | b <----- \_/ / |\ a \___/ b Proof that L(A) = L. By induction on |w|, prove equation (1): { "even" if w contains even number of a's, (1) \delta*("even",w) = { { "odd" if w contains odd number of a's. BC: \delta*("even",\epsilon) = "even" and \epsilon contains an even number of a's (0 is even). IH: Suppose n (- N and equation (1) holds \-/ w (- {a,b}^n. IS: Suppose w (- {a,b}^{n+1}. Since n >= 0, w = w'x for some w' (- {a,b}^n, x (- {a,b}, and by definition, \delta*("even",w) = \delta(\delta*("even",w'),x). Case 1: \delta*("even",w') = "even". Then by the IH, w' contains an even number of a's. Subcase A: x = a. Then w = w'a contains an odd number of a's and \delta*("even",w) = \delta("even",a) = "odd", so equation (1) holds. Subcase B: x = b. Then w = w'b contains an even number of a's and \delta*("even",w) = \delta("even",b) = "even", so equation (1) holds. In all subcases, equation (1) holds. Case 2: \delta*("even",w') = "odd". Then by the IH, w' contains an odd number of a's. Subcase A: x = a. Then w = w'a contains an even number of a's and \delta*("even",w) = \delta("odd",a) = "even", so equation (1) holds. Subcase B: x = b. Then w = w'b contains an odd number of a's and \delta*("even",w) = \delta("odd",b) = "odd", so equation (1) holds. In all subcases, equation (1) holds. In all cases, equation (1) holds. Concl: By induction, equation (1) holds for all strings w. Note: Each specific case/subcase is very easy (almost trivial) and simply verifies that one individual transition is correct. Proof is long simply because there must be one case for each possible state and one subcase for each possible alphabet symbol. Heart of proof lies in correctly identifying "meaning" of each state as in equation (1). - Now, use this information to prove L(A) = L: L(A) subset of L: Let w (- L(A). By definition, this means \delta*("even",w) (- F, i.e., \delta*("even",w) = "even". By equation (1), this means w contains an even number of a's, i.e., w (- L. L subset of L(A): Let w (- L. Then, w contains an even number of a's so \delta*("even",w) = "even", by equation (1). This means A accepts w, so w (- L(A). - See textbook for other detailed examples.