=========================================================================== CSC 236 Lecture Summary for Week 12 Winter 2008 =========================================================================== Examples of construction to convert RE to NFA. - Example: R = (0+1)* (00+11) (0+1)*. Does not contain {} or \epsilon, so start with: (0+1)* (00+11) (0+1)* --> q_0 ---------------------->(q_1) Deal with top-level operators (concatenation) to get: (0+1)* 00+11 (0+1)* --> q_0 -------> q_2 ------> q_3 ------->(q_1) No transition out of q_1 so deal with last transition next: (0+1)* 00+11 --> q_0 -------> q_2 ------>(q_3) 0+1 (q_1) Remove q_1 (no incoming transition) and deal with first transition: 00+11 --> q_0 0+1 q_2 ------>(q_3) 0+1 \___________________/| 00+11 Remove q_2 (no incoming transition), replace '+' with multiple transitions: 00 0,1 ----> --> q_0 (q_3) 0,1 ----> 11 Deal with both concatenations: 0 0 0,1 ---> q_1 ---> --> q_0 (q_3) 0,1 ---> q_2 ---> 1 1 - More detailed example (showing more features of dealing with *): R = 1 (00 + (1+00)*) (1+00) 1 1 (00 + (1 + 00)*) (1 + 00) 1 --> q_0 --------------------------->(q_1) Deal with concatenations first: 1 (00 + (1+00)*) (1+00) 1 --> q_0 ---> q_2 ----------------> q_3 --------> q_4 --->(q_1) Deal with top-level unions, breaking up concatenations at same time: 1 (1 + 00)* 1 1 --> q_0 ---> q_2 ------------> q_3 -------> q_4 --->(q_1) \_ _/| \ /| 0\| /0 0\| /0 q_5 q_6 Now, apply construction to (1+00)*: ___________1___________ / \ 1 / 0 0 1 \| 1 --> q_0 ---> q_2 ---> q_5 ---> q_3 -------> q_4 --->(q_1) | \________________ \0 /| |\ 1+00|/ 0 \|\| /0 | q_7 ------------------> q_6 | 1+00 \_____________1____________/ The last step (not shown -- too hard to draw in ASCII), would be to eliminate the unions and concatenation in the two transitions labelled "1+00" (this would introduce two new states). ------------------ Closure Properties ------------------ Given regular languages L_1, L_2 over alphabet \Sigma, what language operations yield a regular language? L_1 and L_2 regular means there are RE's R_1, R_2 such that L(R_i) = L_i, and also DFA A_1, A_2 such that L_(A_i) = L_i. Union, Concatenation, Star: Obvious from RE's: L(R_1 + R_2) = L(R_1) U L(R_2) = L_1 U L_2 L(R_1 R_2) = L(R_1).L(R_2) = L_1.L_2 L(R_1*) = L(R_1)* = L_1* Complement: Complement of L_1: ~L_1 = \Sigma* - L_1. Consider DFA A'_1 = A_1 except the status of each state is changed (non-accepting states become accepting; accepting states become non-accepting) -- formally, if A_1 = (Q,\Sigma,q_0,F,d), then A'_1 = (Q,\Sigma,q_0,Q-F,d). Then, A'_1 accepts exactly the strings not accepted by A_1, i.e., L(A'_1) = ~L_1. Intersection: L_1 intersect L_2 = ~(~L_1 U ~L_2); since union and complement preserve regularity, so does intersection. Others: - rev(L_1) = { s^R : s (- L }, where s^R is reversal of string s (written "backward") is regular. - prefix(L_1) = { s (- \Sigma* : s.t (- L_1 for some t (- \Sigma* } is regular. - suffix(L_1) = { s (- \Sigma* : t.s (- L_1 for some t (- \Sigma* } is regular. --------------------- Non-regular languages --------------------- Intuition: FSA have fixed, finite memory (states), so cannot "remember" unlimited information about strings. Example: L = { 0^n 1^n : n (- N } = {\epsilon,01,0011,000111,...} is not regular. Proof: For a contradiction, suppose that L is regular. Then, there is a DFA A such that L(A) = L. Let k be the number of states in A (so A's states are {q_0,q_1,...,q_{k-1}}, and consider the behaviour of A on input string 0^{k+1} 1^{k+1}: -> q_i_0 -0-> q_i_1 -0-> q_i_2 -0-> ... -0-> q_i_{k+1} -1-> q_i_{k+2} -1-> ... -1-> q_i_{2k+2} where q_i_0,q_i_1,...,q_i_{2k+2} are states of A and q_i_{2k+2} is acceping. Since A contains only k states, some state of A must be repeated among q_i_0,...,q_i_{k+1} -- there must be some a < b <= k+1 such that i_a = i_b. Schematically, the sequence of states that A goes through on string 0^{k+1} looks like this: q_i_{b-1} <-...- q_i_{a+1} \_ 0/| 0\| / -> q_i_0 -0-> ... -0-> q_i_a -0-> q_i_{b+1} -0-> ... -0-> q_i_{k+1} But then, the behaviour of A on input string 0^{k+1+(b-a)} 1^{k+1} will be the same as on input 0^{k+1} 1^{k+1}, i.e., A accepts some strings that are not in L! This contradicts our assumption that L(A) = L, so there can be no such DFA, i.e., L is not regular. This idea behind this example can be used to prove what is known as the "pumping lemma" (for regular languages): For all regular languages L, there is some k (- N such that for every string s (- L with |s| >= k, there are strings u,v,w such that s = u.v.w, v is not empty, |uv| <= k, and u.v^i.w (- L for all i (- N. ---------------------- Context-free languages ---------------------- We study two formalisms: context-free grammars (generalization of regular expressions) and pushdown automata (generalization of FSA). --------------------- Context-free grammars --------------------- A more powerful way to describe sets of strings, based on the idea of "productions" (replacing variables by strings of symbols). Used to represent many natural-language constructs, and to describe all modern programming languages. Example: S -> 0S1 S -> \epsilon - "S -> 0S1" and "S -> \epsilon" are productions, of the form "variable" -> "string", where strings contain arbitrary mixture of variables and "terminals" (i.e., input symbols) -- so called because they cannot be further substituted for. - Productions for the same variable sometimes combined using "|" to indicate alternatives, e.g., S -> 0S1 | \epsilon. - "S" is special "start variable". - "Derivation" consists of applying productions repeatedly to one variable at a time, starting from some string made up of variables and/or terminals, e.g., S => 0S1 => 0.0S1.1 => 00.0S1.11 => 000.e.111 = 000111 In general, each step of derivation replaces one variable in current string according to one production for that variable -- different choices of production will yield different results. If t can be derived from s, we write s =>* t. - Language generated = set of strings consisting only of terminals that can be derived from start variable = { w in \Sigma* : S =>* w }, where Sigma = { terminals } and S = start variable. For the example, language = { 0^n 1^n : n (- N }. Example 2: G = S -> SS | (S) | \epsilon. L(G) = ?