=========================================================================== CSC 236 Lecture Summary for Week 11 Fall 2007 =========================================================================== - Recall FSA for language L = { w in {a,b}* : w contains an even number of a's }: a _ -----> |/ \ --> (even) odd | b <----- \_/ / |\ a \___/ b Proof that L(A) = L. By induction on |w|, prove equation (1): { "even" if w contains even number of a's, (1) d*("even",w) = { { "odd" if w contains odd number of a's. [Proof not covered in lecture: it is not difficult, but long and tedious, involving a large number of cases and subcases (one for each combination of current state and current symbol). See textbook for other detailed examples.] - Now, use this information to prove L(A) = L: L(A) subset of L: Let w in L(A). By definition, this means d*("even",w) in F, i.e., d*("even",w) = "even". By equation (1), this means w contains an even number of a's, i.e., w in L. L subset of L(A): Let w in L. Then, w contains an even number of a's so d*("even",w) = "even", by equation (1). This means A accepts w, so w in L(A). - To simplify transition diagrams, additional conventions: . multiple transitions from one state to another labelled with different input symbols: combine into one edge with "compound" label (symbols separated by commas); (e.g., for vending machine FSA, instead of having three edges from state 25 to state 30 -- one for each input symbol n, d, q -- have single edge with label "n,d,q") . "dead" states (from which an accepting state can never be reached) are not drawn. Careful! With this additional convention, a "missing" transition in a diagram does _NOT_ mean FSA stays in that state: it means FSA goes to dead state and rejects. - Example: F.S.A. to accept floating-point numbers of the form +/-n or +/-n.m, where n and m are decimal integers (non-empty strings over the digits {0,1,2,3,4,5,6,7,8,9}). E.g., "+3.0", "2", "-0.01" are acceptable but "3.", "-.5", "4.2.3", "--1" are not. Picture, using conventions above, "0-9" for "0,1,2,3,4,5,6,7,8,9", and parentheses around accepting states: 0-9 0-9 / \ / \ +,- 0-9 \| / . 0-9 \| / --> q_0 -----> q_1 ----->(q_2)-----> q_3 ----->(q_4) --------------- Non-determinism --------------- In a non-deterministic finite state automaton (NFA), transition function gives a *set* of possible next states for each state and input symbol. Formally, d : Q x S -> P(Q) (where P(Q) is the powerset of Q, the set of all subsets of Q). Up until now, worked with DFA ("deterministic" FSA). Example: q_1 --->(q_3) 0,1 Notice in q_0 reading 0 or 1, there are |\ 0 two possible next states. Also, no next 0 | state reading 1 in q_1 or 0 in q_2: same | as "dead state" convention above. --> q_0 0,1 | How do we tell if NFA accepts or rejects 1 | an input string? Matter of convention: |/ 1 input accepted iff there is some choice q_2 --->(q_4) 0,1 of transitions that accepts. Intuitively, like saying the NFA "tries out" all possibilities in parallel. If any possibility accepts, NFA accepts; if all possibilities reject, NFA rejects. Example NFA on string 01001, keeping track of set of all possible states we could be in after processing each symbol: 0 1 0 0 1 --> q_0 ---> q_0 ---> q_0 ---> q_0 ---> q_0 ---> q_0 \_ \_ \_ \_ \_ \_> q_1 \_> q_2 \_> q_1 \_> q_1 \_> q_2 \_ \_ \_ \_ \_> { } \_> { } \_>(q_3) \_> { } \_ \_>(q_3) Possible last states are q_0,q_2,q_3; since these include accepting state q_3, string 01001 accepted. Formally, define extended transition function d* as follows: for all strings w, state q, { q if w = e, d*(q,w) = { { U_{q' in d*(q,w')} d(q',a) if w = w'a for a in S. Language of NFA defined as before (set of strings accepted). NOTE: In general, important distinction between "there exists some choice of transitions that leads to accepting state" and "we can find some choice of transitions that leads to accepting state". NFA's only require first one. In that sense, NFA's cannot be implemented "directly": any implementation must try out all possibilities. ----------------- Regular languages ----------------- Definition: Language L is "regular" if L = L(R) for some reg. exp. R. Theorem: L is regular iff L = L(A) for some NFA A, iff L = L(A') for some DFA A'. Proof: Show how to construct DFA A' from any NFA A such that L(A') = L(A). Then show how to construct RE R from any DFA A such that L(R) = L(A). Then show how to construct NFA A from any RE R such that L(A) = L(R). NFA -> DFA: Idea (already given above): "subset construction" -- for NFA A, construct DFA A' whose states represent sets of state of A. Initial state of A' is { q_0 } (where q_0 is initial state of A); state of A' is accepting iff it contains some accepting state of A; transition from state { q_1, ..., q_k } in A' on symbol a is simply union of transitions of A from each q_i on symbol a. Example: For NFA above, start with initial state { q_0 }, add transitions on 0 and on 1, creating new states as required. (In diagram, I use shorthand like "q_012" to represent {q_0,q_1,q_2}.) ---> q_01 --->(q_013) 1 (q_023)--->(q_0234) 1 0/| ||\ 0 0 <--- 1 ||\ / || 0 || --> q_0 0||1 1||0 \ || 0 || 1\|\|| 1 1 ---> 0 \|| q_02 --->(q_024) 1 (q_014)--->(q_0134) 0 <--- This works for all NFA's and resulting DFA accepts same language: string accepted by NFA means some choice of transitions leads NFA to accepting state means DFA will end in state that includes accepting state; string accepted by DFA means end state includes accepting state means some choice of transitions leads NFA to accepting state. Note all accepting states above "trap" in the sense once string accepted, no input symbol can make DFA go back to rejecting state. So DFA can be simplified as follows: q_01 0/| ||\ \0 / || \| --> q_0 1||0 (q) 0,1 \_ || _/| 1\|\|| /1 q_02 DFA -> RE: Idea: "state elimination" construction -- eliminate intermediary states and replace transition labels with more general RE's. More precisely, for each accepting state q, eliminate all states except q and initial state to generate RE R_q (representing pattern for all strings accepted at state q). RE for DFA consists of union of RE's for all accepting states. As construction proceeds, labels on transitions become more elaborate RE's (starting from single symbols in original DFA). - How to eliminate state q: . let s_1,...,s_m be all states that have a transition into q, with labels S_1,...,S_m; . let t_1,...,t_n be all states that have a transition from q, with labels T_1,...,T_n; . note: it is quite possible to have s_i = t_j for some i,j; . for each 1 <= i <= m, 1 <= j <= n, let R_{i,j} be label of transition from s_i to t_j (R_{i,j} = {} if no transition); . let Q be label of loop on state q (Q = {} if no loop); . remove q from FSA and for each 1 <= i <= m, 1 <= j <= n, replace label R_{i,j} with R_{i,j} + S_i Q* T_j; . special cases: o if R_{i,j} = {}, add transition s_i --S_i Q* T_j--> t_j; o if Q = {}, S_i Q* T_j simplifies to S_i T_j. - How to write down R_q for accepting state q: . once all states removed except R -S-> initial q_0 and final q, only --> q_0 (q) Q four possible transitions remain: <-T- . R_q = R* S (Q + T R* S)*; . special case: q_0 accepting leaves only one state -->(q_0) Q with R_{q_0} = Q*. Example (from above): q_1 0/| ||\\0 / || \| --> q_0 1||0 (q_3) 0,1 \_ || _/| 1\|\|| /1 q_2 Eliminate q_1: 00 --> q_0 --------->(q_3) 0+1 \_ _/| 1+01\| /1+00 q_2 01 Eliminate q_2: 00 + (1+01) (01)* (1+00) --> q_0 ------------------------->(q_3) 0+1 R_{q_3} = ( 00 + (1+01) (01)* (1+00) ) (0+1)* Since q_3 was only accepting state, this RE equivalent to FSA.