=========================================================================== CSC 236 Lecture Summary for Week 11 Winter 2008 =========================================================================== Marker's office hours: Thu 27 Mar BA4290 -- A2: 2-3pm, T2: 3-4pm. **Official course evaluations next week!** Please show up! FSA: - To simplify transition diagrams, additional conventions: . multiple transitions from one state to another labelled with different input symbols: combine into one edge with "compound" label (symbols separated by commas); (e.g., for vending machine FSA, instead of having three edges from state 25 to state 30 -- one for each input symbol n, d, q -- have single edge with label "n,d,q") . "dead" states (from which an accepting state can never be reached) are not drawn. Careful! With this additional convention, a "missing" transition in a diagram does _NOT_ mean FSA stays in that state: it means FSA goes to dead state and rejects. - Example: F.S.A. to accept floating-point numbers of the form +/-n or +/-n.m, where n and m are decimal integers (non-empty strings over the digits {0,1,2,3,4,5,6,7,8,9}). E.g., "+3.0", "2", "-0.01" are acceptable but "3.", "-.5", "4.2.3", "--1" are not. Picture, using conventions above, "0-9" for "0,1,2,3,4,5,6,7,8,9", and parentheses around accepting states: 0-9 0-9 / \ / \ +,- 0-9 \| / . 0-9 \| / --> q_0 -----> q_1 ----->(q_2)-----> q_3 ----->(q_4) --------------- Non-determinism --------------- In a non-deterministic finite state automaton (NFA), transition function gives a *set* of possible next states for each state and input symbol. Formally, \delta : Q x S -> P(Q) (where P(Q) is the powerset of Q, the set of all subsets of Q). Up until now, worked with DFA ("deterministic" FSA). Example: q_1 --->(q_3) 0,1 Notice in q_0 reading 0 or 1, there are |\ 0 two possible next states. Also, no next 0 | state reading 1 in q_1 or 0 in q_2: | similar to "dead state" convention above. --> q_0 0,1 | How do we tell if NFA accepts or rejects 1 | an input string? Matter of convention: |/ 1 input accepted iff there is *some* choice q_2 --->(q_4) 0,1 of transitions that accepts. Intuitively, like saying the NFA "tries out" all possibilities in parallel. If any possibility accepts, NFA accepts; if all possibilities reject, NFA rejects. Example NFA on string 01001, keeping track of set of all possible states we could be in after processing each symbol: 0 1 0 0 1 --> q_0 ---> q_0 ---> q_0 ---> q_0 ---> q_0 ---> q_0 \_ \_ \_ \_ \_ \_> q_1 \_> q_2 \_> q_1 \_> q_1 \_> q_2 \_ \_ \_ \_ \_> { } \_> { } \_>(q_3) \_> { } \_ \_>(q_3) Possible last states are {q_0,q_2,q_3}; since these include accepting state q_3, string 01001 accepted. Formally, define extended transition function \delta* as follows: for all strings w, states q, \delta*(q,w) = q if w = \epsilon, \delta*(q,w) = U_{q' (- \delta*(q,w')} \delta(q',a) if w = w'a for a (- S. Language of NFA defined as before (set of strings accepted). NOTE: In general, important distinction between "there exists some choice of transitions that leads to accepting state" and "we can find some choice of transitions that leads to accepting state". NFA's only require first one (existence). In that sense, NFA's are conceptual machines that cannot be implemented (any implementation must try out all possibilities, but definition of NFA acceptance only requires that possibility exists, independently of being able to actually find that possibility). ----------------- Regular languages ----------------- Definition: Language L is "regular" if L = L(R) for some reg. exp. R. Theorem: L is regular iff L = L(A) for some NFA A, iff L = L(A') for some DFA A'. Proof: Show how to construct DFA A' from any NFA A such that L(A') = L(A). Then show how to construct RE R from any DFA A such that L(R) = L(A). Then show how to construct NFA A from any RE R such that L(A) = L(R). NFA -> DFA: Idea (already given above): "subset construction" -- for NFA A, construct DFA A' whose states represent sets of state of A. Initial state of A' is { q_0 } (where q_0 is initial state of A); state of A' is accepting iff it contains some accepting state of A; transition from state { q_1, ..., q_k } in A' on symbol a is simply union of transitions of A from each q_i on symbol a. Example: For NFA above, start with initial state { q_0 }, add transitions on 0 and on 1, creating new states as required. (In diagram, I use shorthand like "q_012" to represent {q_0,q_1,q_2}.) ---> q_01 --->(q_013) 1 (q_023)--->(q_0234) 1 0/| ||\ 0 0 <--- 1 ||\ / || 0 || --> q_0 0||1 1||0 \ || 0 || 1\|\|| 1 1 ---> 0 \|| q_02 --->(q_024) 1 (q_014)--->(q_0134) 0 <--- This works for all NFA's and resulting DFA accepts same language: string accepted by NFA means some choice of transitions leads NFA to accepting state means DFA will end in state that includes accepting state; string accepted by DFA means end state includes accepting state means some choice of transitions leads NFA to accepting state. Note all accepting states above "trap" in the sense once string accepted, no input symbol can make DFA go back to rejecting state. So DFA can be simplified as follows: q_01 0/| ||\ \0 / || \| --> q_0 1||0 (q) 0,1 \_ || _/| 1\|\|| /1 q_02 DFA -> RE: Idea: "state elimination" construction -- eliminate intermediary states and replace transition labels with more general RE's. More precisely, for each accepting state q, eliminate all states except q and initial state to generate RE R_q (representing pattern for all strings accepted at state q). RE for DFA consists of union of RE's for all accepting states. As construction proceeds, labels on transitions become more elaborate RE's (starting from single symbols in original DFA). - How to eliminate state q: . let s_1,...,s_m be all states that have a transition into q, with labels S_1,...,S_m; . let t_1,...,t_n be all states that have a transition from q, with labels T_1,...,T_n; . note: it is quite possible to have s_i = t_j for some i,j; . for each 1 <= i <= m, 1 <= j <= n, let R_{i,j} be label of transition from s_i to t_j (R_{i,j} = {} if no transition); . let Q be label of loop on state q (Q = {} if no loop); . remove q from FSA and for each 1 <= i <= m, 1 <= j <= n, replace label R_{i,j} with R_{i,j} + S_i Q* T_j; . special cases: o if R_{i,j} = {}, add transition s_i --S_i Q* T_j--> t_j; o if Q = {}, S_i Q* T_j simplifies to S_i T_j. - How to write down R_q for accepting state q: . once all states removed except R -S-> initial q_0 and final q, only --> q_0 (q) Q four possible transitions remain: <-T- . R_q = R* S (Q + T R* S)*; . special case: q_0 accepting leaves only one state -->(q_0) Q with R_{q_0} = Q*. Example (from above): q_1 0/| ||\\0 / || \| --> q_0 1||0 (q_3) 0,1 \_ || _/| 1\|\|| /1 q_2 Eliminate q_1: 00 --> q_0 --------->(q_3) 0+1 \_ _/| 1+01\| /1+00 q_2 01 Eliminate q_2: 00 + (1+01) (01)* (1+00) --> q_0 ------------------------->(q_3) 0+1 R_{q_3} = ( 00 + (1+01) (01)* (1+00) ) (0+1)* Since q_3 was only accepting state, this RE equivalent to FSA. RE -> NFA: Idea: start with FSA "-> q_0 --R-->(q)" then break up R into component pieces, adding states and transitions as necessary, until each transition labelled by single symbols. Will need two results about RE's: 1. For each RE R, either R == {} or R == R' for some R' that does not contain symbol {}. 2. For each RE R, either R == R' or R == R' + \epsilon for some R' that does not contains symbol \epsilon. Both proved by induction on number of operators in R (exercise). Let R be any RE. - If R == {}, A = "-> q_0" is equivalent. - If R == \epsilon, A = "->(q_0)" is equivalent. - Else, let R == R' or R == R' + \epsilon where R' does not contain symbol {} or symbol \epsilon. - If R == R', start with A = "-> q_0 -R'->(q)"; if R == R' + \epsilon, start with A = "->(q_0)-R'->(q)". - Repeat until each transition labelled with one symbol: . transitions q -S+T-> q' become q -S,T-> q' (with convention this represents two transitions, one labelled S the other T) -- for loops, q' = q; . transitions q -ST-> q' become q -S-> n -T-> q', where n is new non-accepting state (for loops, q' = q); . loops q -S*-> q become q -S-> q; . transitions q -S*-> q' (for q' != q) become ------ ------- -T_1-> / / / --> q -S-> n S q' ... \ \ \ ------ ------- -T_k-> where T_1,...,T_k are all transitions out of q' (including any loops), n is new state, q, n are accepting if q' is accepting, and q' is removed if it has no more incoming transitions. If q -S*-> q' was only transition out of q (including loops), this can be simplified to ------- -T_1-> / / --> q S q' ... \ \ ------- -T_k-> NOTE: Examples not covered in lecture (for lack of time). I will go over them *quickly* next week. - Example: R = (0+1)* (00+11) (0+1)*. Does not contain {} or \epsilon, so start with: (0+1)* (00+11) (0+1)* --> q_0 ---------------------->(q_1) Deal with top-level operators (concatenation) to get: (0+1)* 00+11 (0+1)* --> q_0 -------> q_2 ------> q_3 ------->(q_1) No transition out of q_1 so deal with last transition next: (0+1)* 00+11 --> q_0 -------> q_2 ------>(q_3) 0+1 (q_1) Remove q_1 (no incoming transition) and deal with first transition: 00+11 --> q_0 0+1 q_2 ------>(q_3) 0+1 \___________________/| 00+11 Remove q_2 (no incoming transition), replace '+' with multiple transitions: 00 0,1 ----> --> q_0 (q_3) 0,1 ----> 11 Deal with both concatenations: 0 0 0,1 ---> q_1 ---> --> q_0 (q_3) 0,1 ---> q_2 ---> 1 1 - More detailed example (showing more features of dealing with *): R = 1 (00 + (1+00)*) (1+00) 1 1 (00 + (1 + 00)*) (1 + 00) 1 --> q_0 ------------------------------->(q_1) Deal with concatenations first: 1 (00 + (1+00)*) (1+00) 1 --> q_0 ---> q_2 ----------------> q_3 --------> q_4 --->(q_1) Deal with top-level unions, breaking up concatenations at same time: 1 (1 + 00)* 1 1 --> q_0 ---> q_2 ------------> q_3 -------> q_4 --->(q_1) \_ _/| \ /| 0\| /0 0\| /0 q_5 q_6 Now, apply construction to (1+00)*: ___________1___________ / \ 1 / 0 0 1 \| 1 --> q_0 ---> q_2 ---> q_5 ---> q_3 -------> q_4 --->(q_1) | \________________ \0 /| |\ 1+00|/ 0 \|\| /0 | q_7 ------------------> q_6 | 1+00 \_____________1____________/ The last step (not shown -- too hard to draw in ASCII :), would be to eliminate the unions and concatenation in the two transitions labelled "1+00" (this would introduce two new states).