=========================================================================== CSC 363 Lecture Summary for Week 4 Winter 2008 =========================================================================== Recall A_DFA = { : B is a DFA that accepts input w } Notation "" means "some reasonable encoding of a DFA B and input string w, as a string", where "reasonable" means that the string can always be parsed by a TM to decide whether or not it is in the proper encoding. A_DFA is decidable, by following TM: M = "On input : 1. Simulate B on w. 2. Accept if B accepts; reject if B rejects." Note 1: "On input : ..." is shorthand for "On input x: (meaning: let x be the value on the tape when we start) 1. Check that x is the string encoding of some DFA B and input string w, i.e., that x = . 2. If x does not have that form, reject; otherwise, ..." Note 2: "Simulate B on w" means M uses part of its tape to keep track of B's current state and location on input w, and moves back-and-forth to check the transition table of B and update its tape to reflect the transitions executed by B on input w. All of the information about B is contained on M's tape -- none of it is hard-coded into M's own transition table. By contrast, if we had a TM M containing instructions "Run M' on w" (we saw this in a previous exercises), it would mean something completely different: that some of the transitions of M are hard-coded to be the same as the transitions of M' -- none of the information about the transitions of M' is contained on the tape of M: it's all hard-coded directly into the transition table of M. --------------- Diagonalization -- section 4.2, pp.174-178 --------------- Which set is "larger": N or Z? Easy: Z contains N as proper subset. Which set has larger "size"? Not so easy: for finite sets, adding or removing one element reduces size, but not so for infinite sets. So what does "infinite size" mean? Are all infinite sets the same "size"? Idea: compare size of sets without counting (can't count "to infinity"), by pairing up elements from each set. If both sets can be paired completely, then sets have same size. Otherwise, one set has "larger size". This concept is formalized through notion of "correspondence" (see textbook -- these are just bijections, or one-to-one and onto functions). Note: this is equivalent to usual notion of "size" for finite sets. Defn: A set is "countable" if it is finite or if there is a correspondence between the natural numbers and the set, i.e., if there is a systematic way to list every element in the set exactly once. - {0,1}* is countable: list strings in lexicographic order, i.e., use correspondence \epsilon, 0, 1, 00, 01, 10, 11, 000, 001, ... - Z is countable: use correspondence 0 -1 1 -2 2 -3 3 -4 4 ... - Q is countable: "zig-zag" argument (see textbook for details -- covered in lecture) - R is uncountable: proof by contradiction through diagonalization (see textbook for details -- covered in lecture) What does this have to do with languages? - Number of TMs countable because we can list all strings over appropriate alphabet and keep only the ones that describe valid TMs. - Number of languages uncountable because same as number of infinite binary strings (same as number of real numbers) -- see textbook for details, covered in lecture. - Hence, "only" countably many recognizable languages, i.e., _most_ languages are unrecognizable! But are there any natural unrecognizable languages? --------------------- The "Halting" problem --------------------- More properly the "acceptance" problem/language for TMs, A_TM = { | M is a TM that accepts input w }, is recognizable. - There is a TM U (the "universal TM") that takes a reasonable encoding of M and its input w, and that carries out M's computation on w. U accepts if M accepts w, U rejects if M rejects w, and U loops if M loops on w. More formally: U = "On input : 1. Simulate M on input w (use portion of tape to represent M's configuration -- state and content of M's tape, including head position -- and move back-and-forth between M's description and M's configuration for each simulation step). 2. Accept if M accepts; reject if M rejects." Difference from A_DFA: U could get stuck in infinite loop in stage 1 (if M never halts on w, U will simply keep simulating), so U recognizes A_TM but U does not decide A_TM -- this does not show that A_TM undecidable, just that U is not a decider for A_TM. Note: U is "general-purpose computer": like all TMs, hard-wired to carry out exactly one task, but that task depends on instructions provided in input. Theorem: A_TM is undecidable. Proof: next week... But what about unrecognizable languages? Theorem 4.22 on pp.181-182: Language L is decidable iff both L and ~L are recognizable (recall ~L is complement of L). Proof: next week... (This is not a difficult theorem and would make a great exercise to test your understanding of these concepts -- give it a try before looking at the proof.) Corollary: ~A_TM (the complement of A_TM) is unrecognizable. (Because A_TM is recognizable but undecidable.)