===========================================================================
CSC 363H                Lecture Summary for Week  7               Spring 2007
===========================================================================

---------------
Closing remarks
---------------

Undecidable problems from other domains:
 -  Other models of computation.
 -  Post correspondence problem (section 5.2 in textbook).
 -  Conway's "game of life".
 -  Check out <http://en.wikipedia.org/wiki/Category:Computability>
    for many more.

---------------------------------------------------------------------------

Computational Complexity
========================

Outline (topics and textbook sections):

 -  BRIEF review of complexity analysis and asymptotic notation,
    complexity classes TIME(t(n)), robustness (7.1).

 -  The classes P and NP -- definitions and examples (7.2, 7.3).

 -  Polytime reducibility, self-reducibility; NP-completeness (7.4, 7.5).

 -  Space complexity; other complexity classes (8.2, 8.2, 9.1).

 -  Dealing with NP-completeness (10.1, 10.2).

-------------------
Complexity analysis
-------------------

Motivation: answer question "what is efficient computation?" (by analogy
with first part of course that considered "what is computation").

First, must agree on how to measure efficiency.  Two standard measures:
time and space.  Easy to define on TM model: time = number of steps; space
= number of tape cells.  Worst-case vs. average-case measures as a function
of input size are defined as usual, using asymptotic notation (big-Oh,
Omega, Theta).

Def: Let M be a deterministic TM that is a decider. The running time of M is 
a function t : N -> N so that f(n) is the maximum number of steps M uses when
run on inputs of size n.

Note: We care about _worst case_ time.

Complexity class TIME(t(n)), for some function t : N -> N:

    TIME(t(n)) = { L | L is a language decided by some TM that runs
                        in worst-case time O(t(n)) }

For example, TIME(n^2) contains every language decided by a TM in
worst-case time O(n^2).

Example: A = { 0^n 1^n | n >= 0 }.  Can be decided in time O(n^2) by
repeatedly scanning back-and-forth, crossing off a single 0 and 1 during
each pass.  But possible to do better by repeatedly crossing off half of
the 0s and 1s until none are left, rejecting if at any point the number of
0s and 1s are not both even or odd -- this takes time O(n log n) only.
This time cannot be improved: possible to show any language decided in time
o(n log n) on standard TM is regular!
With two tapes, A can be decided in time O(n): go over all of the 0s and
copy them to second tape, then when first head starts going over 1s move
backwards over 0s on second tape.  If the end of the 1s and 0s is reached
at the same time, accept; otherwise, reject.

So specific model used changes meaning of complexity classes.

-----------------
Computation model
-----------------

Multitape TM:  Every multitape TM that runs in time t(n) >= n has an
equivalent single-tape TM that runs in time O((t(n))^2).

    Convert multitape TM M to single-tape TM S.  Each step of M requires
    two passes over entire tape contents of S.  Since M runs in time t(n),
    each tape of M contains at most O(t(n)) symbols so tape of S also
    contains at most O(t(n)) symbols; hence, each step of M requires time
    O(t(n)) on S.  Entire computation requires time t(n) * O(t(n)) =
    O((t(n))^2).

Non-deterministic TM:  Every NTM that runs in time t(n) >= n has an
equivalent single-tape TM that runs in time 2^O(t(n)).

    Convert NTM M to DTM S.  If M is a decider, then every branch halts.
    Define running time to be height of computation tree (length of longest
    computation branch).  Since M runs in time t(n), there are at most
    b^t(n) leaves in computation tree (where b = maximum branching factor).
    Performing BFS on this tree requires time O(t(n) * b^t(n)) = O(C^t(n))
    (for some constant C >= b) = 2^O(t(n)).

Other variants: minor modifications of basic model or ones above don't
affect time complexity significantly (e.g., two-way infinite tape can be
simulated using 2 tapes with no loss of efficiency).

Consequence: TIME(t(n)) sensitive to particular model.  Not as nice as for
computability, where model did not affect outcome.  Can we do better?

-----------
The class P
-----------

Concentrate on "coarse scale": ignore polynomial differences.  Not that
they are unimportant, but that larger differences must be addressed first.

"Tractability thesis":
all reasonable deterministic models are polynomially equivalent (i.e., can
simulate one another with at most a polynomial factor loss of efficiency).

P = U_k TIME(n^k), i.e., language L belongs to P iff L can be decided by
some deterministic algorithm (TM or otherwise) in worst-case polytime.

Importance:
  - Robust: does not depend on details of model, as long as it is
    deterministic.
  - Useful: captures rough notion of efficiency.

Examples:  Almost all algorithms you've seen in other courses.

------------
The class NP
------------

We've seen that nondeterminism is not "practical".  Why use it, then?

Because a large number of real-life problems have no known efficient
solution (i.e., are not known to belong to P), yet can be solved
efficiently using nondeterminism.  So nondeterminism allows us to
characterize a large class of problems.  Also, nondeterminism is an elegant
way to add (what seems to be) significant power to the model.

NTIME(t(n)) =
    { L : L is a language decided by a NTM in worst-case time O(t(n)) }
NP = U_k NTIME(n^k)
   = { L : L is decided by some polytime NTM }
   = { L : L has a polytime verifier }

By tractability thesis, NP independent of specific details of
nondeterministic model (as long as it's nondeterministic).