===========================================================================
CSC 363H                Lecture Summary for Week  4               Spring 2007
===========================================================================

 -  Relationship between decidability and recognizability? (in tutorial)
    Theorem 4.22 on pp.181-182 (1st ed: 4.16 on pp.167-168):  Language L is
    decidable iff both L and L^c are recognizable (L^c is complement of L).
    Proof: covered in tutorial.

---------------
Diagonalization -- section 4.2, pp.174-178 (1st ed: pp.160-164)
---------------

In this class we learn how to count!

Which set is "larger": N or Z?  Easy: Z contains N as proper subset.  Which
set has larger "size"?  Not so easy: for finite sets, adding or removing
one element reduces size, but not so for infinite sets.  So what does
"infinite size" mean?  Are all infinite sets the same "size"?

Idea: compare size of sets without counting (can't count "to infinity"), by
pairing up elements from each set.  If both sets can be paired completely,
then sets have same size.  Otherwise, one set has "larger size".  This
concept is formalized through notion of "correspondence" (see textbook).
Note: this is equivalent to usual notion of "size" for finite sets.

Defn: A set is "countable" if it is finite or if there is a correspondence
between the natural numbers and the set, i.e., if there is a systematic way
to list every element in the set exactly once.

 -  {0,1}^* is countable: list strings in lexicographic order, i.e.,
    use correspondence e, 0, 1, 00, 01, 10, 11, 000, 001, ... (where "e"
    represents the empty string)

 -  Z is countable: use correspondence 0 -1 1 -2 2 -3 3 -4 4 ...

 -  Q is countable: "zig-zag" argument (see textbook for details -- covered
    in lecture)

 -  R is uncountable: proof by contradiction through diagonalization
    (see textbook for details -- covered in lecture)

What does this have to do with languages?

 -  Number of languages uncountable because same as number of infinite
    binary strings (same as number of real numbers).

 -  Number of TMs countable because we can list all strings over
    appropriate alphabet and keep only the ones that describe valid TMs.

 -  Hence, "only" countably many recognizable languages, i.e., _most_
    languages are unrecognizable!

But are there any natural unrecognizable languages?

-------------------
The Halting problem
-------------------

A_TM = { <M,w> | M is a TM that accepts input w } is recognizable.

 -  As described last week, there is a TM U (the "universal TM") that takes
    a reasonable encoding of <M> and its input w, and that carries out M's
    computation on w.  U accepts if M accepts w, U rejects if M rejects w
    (U goes into an infinite loop if M goes into an infinite loop on w).

Theorem: A_TM is undecidable.

Proof:
 -  For contradiction, assume A_TM decidable, i.e., there is a TM H that
    accepts input <M,w> if M accepts w, and rejects if M does not accept w
    (either because M rejects or loops).
 -  Construct TM D that includes H as a "subroutine".  On input <M>, D runs
    H's instructions on <M,<M>>; if H accepts, D rejects; if H rejects, D
    accepts.  In other words, on input <M>, D rejects if M accepts input
    <M> and D accepts if M does not accept input <M>.
 -  What happens if we give <D> as input to D?  D should reject if D
    accepts input <D> and D should accept if D does not accept input <D>,
    i.e., D accepts input <D> iff D does not accept input <D>.
 -  Contradiction!  Hence, D cannot exist, which means H cannot exist,
    i.e., no TM can decide A_TM -- by the Church-Turing thesis, this means
    there is no general algorithm for solving this problem!

Corollary: A^c_TM (the complement of A_TM) is unrecognizable.
    (Because A_TM is recognizable but undecidable.)


HALT_TM = { <M,w> | M is a TM that halts on input w } is undecidable.
Proof:
 - Assume HALT_TM is decidable and let S be decider.
 - Construct a machine T deciding A_TM as follows 
      T = "on input w:
              run S on <M,w>; if S rejects then REJECT
                otherwise run M on w and answer the same."
      T decides A_TM: if M accepts or rejects w this is due to the second
         condition here, and if it loops the first condition hold and it 
         will reject.