=========================================================================== CSC 363H Lecture Summary for Week 4 Fall 2007 =========================================================================== --------------- Diagonalization -- section 4.2, pp.174-178 --------------- Which set is "larger": N or Z? Easy: Z contains N as proper subset. Which set has larger "size"? Not so easy: for finite sets, adding or removing one element reduces size, but not so for infinite sets. So what does "infinite size" mean? Are all infinite sets the same "size"? Idea: compare size of sets without counting (can't count "to infinity"), by pairing up elements from each set. If both sets can be paired completely, then sets have same size. Otherwise, one set has "larger size". This concept is formalized through notion of "correspondence" (see textbook -- these are just bijections, or one-to-one and onto mappings). Note: this is equivalent to usual notion of "size" for finite sets. Defn: A set is "countable" if it is finite or if there is a correspondence between the natural numbers and the set, i.e., if there is a systematic way to list every element in the set exactly once. - {0,1}^* is countable: list strings in lexicographic order, i.e., use correspondence e, 0, 1, 00, 01, 10, 11, 000, 001, ... (where "e" represents empty string) - Z is countable: use correspondence 0 -1 1 -2 2 -3 3 -4 4 ... - Q is countable: "zig-zag" argument (see textbook for details -- covered in lecture) - R is uncountable: proof by contradiction through diagonalization (see textbook for details -- covered in lecture) What does this have to do with languages? - Number of TMs countable because we can list all strings over appropriate alphabet and keep only the ones that describe valid TMs. - Number of languages uncountable because same as number of infinite binary strings (same as number of real numbers). - Hence, "only" countably many recognizable languages, i.e., _most_ languages are unrecognizable! But are there any natural unrecognizable languages? --------------------- The "Halting" problem --------------------- More properly the "acceptance" problem/language for TMs, A_TM = { | M is a TM that accepts input w }, is recognizable. - There is a TM U (the "universal TM") that takes a reasonable encoding of M and its input w, and that carries out M's computation on w. U accepts if M accepts w, U rejects if M rejects w, and U loops if M loops on w. More formally: U = "On input : 1. Simulate M on input w (use portion of tape to represent M's configuration -- state and content of M's tape, including head position -- and move back-and-forth between M's description and M's configuration for each simulation step). 2. Accept if M accepts; reject if M rejects." Difference from A_DFA: U could get stuck in infinite loop in stage 1 (if M never halts on w, U will simply keep simulating), so U recognizes A_TM but U does not decide A_TM -- this does not show that A_TM undecidable, just that U is not a decider for A_TM. Note: U is "general-purpose computer": like all TMs, hard-wired to carry out exactly one task, but that task depends on instructions provided in input. Theorem: A_TM is undecidable. Proof: - For contradiction, assume A_TM decidable, i.e., there is a TM S that accepts input if M accepts w, and rejects if M does not accept w (either because M rejects or loops). - Construct TM D that includes S as a "subroutine". On input , D runs S's instructions on >; if S accepts, D rejects; if S rejects, D accepts. In other words, on input , D rejects if M accepts input and D accepts if M does not accept input . - What happens if we give as input to D? D should reject if D accepts input and D should accept if D does not accept input , i.e., D accepts input iff D does not accept input . - Contradiction! Hence, D cannot exist, which means S cannot exist, i.e., no TM can decide A_TM -- by the Church-Turing thesis, this means there is no general algorithm for solving this problem! Corollary: ~A_TM (the complement of A_TM) is unrecognizable. (Because A_TM is recognizable but undecidable and from the fact that for all languages L, L decidable iff both L and ~L recognizable.) HALT_TM = { | M is a TM that halts on input w } is undecidable. Proof: For a contradiction, suppose HALT_TM has decider H, and consider: A = "On input : 1. Run H on . Reject if H rejects. Otherwise, 2. Simulate M on w. Accept if M accepts; reject if M rejects." Then, A accepts if M accepts w, and A rejects if M loops on w or if M rejects w, i.e., A decides A_TM, a contradiction. Hence, HALT_TM is undecidable. HALT_TM is recognizable (on input , simulate M on w and accept if M accepts or rejects), so this means ~HALT_TM is unrecognizable.