=========================================================================== CSC 363 Lecture Summary for Week 11 Winter 2008 =========================================================================== Marker's office hours: Thu 27 Mar BA4290 -- T2: 5-6pm, A2: 6-7pm. **Official course evaluations next week!** Please show up! ----------------- Self-reducibility ----------------- Note: This material is not in the textbook. This summary will therefore be slightly more detailed than usual. Problem of deciding language A sometimes called "decision problem": given input x, solution = yes/no answer. But many problems are more naturally "search problems": given input x, find solution y. Examples: - Given prop. formula F, find satisfying assignment, if one exists. - Given graph G, integer k, find a clique of size k in G, if one exists. - Given graph G, find a Ham. path in G, if one exists. - Given set of numbers S, target t, find subset of S whose sum equals t, if one exists. - etc. Many languages come from natural search problems. Clearly, efficient solution to search problem would give efficient solution to corresponding decision problem. So proof that decision problem is NP-hard implies that search problem is "hard" as well, and does not have efficient solution. But exactly how much more difficult are search problems? Perhaps surprisingly, many (but not all) are only polynomially more difficult than corresponding decision problem, in the following sense: any efficient solution to the decision problem can be used to solve the search problem efficiently. This is called "self-reducibility". Example 1: CLIQUE-SEARCH Input: Undirected graph G, positive integer k. Output: A clique of size k in G, if one exists (special value None if there is no such clique in G). - Idea: For each vertex in turn, remove it iff resulting graph still contains a k-clique. - Details: Assume we have an algorithm CL(G,k) that returns true iff G contains a clique of size k. We construct an algorithm to solve CLIQUE-SEARCH as follows. CLS(G,k): if not CL(G,k): return None # no k-clique in G for each vertex v (- V: # remove v and its incident edges V' = V - {v}; E' = E - { (u,v) : u (- V } # check if there is still a k-clique if CL(G'=(V',E'),k): # v not required for k-clique, leave it out V = V'; E = E' return V - Correctness: CL(G=(V,E),k) remains true at every iteration so at the end, V contains every vertex in a k-clique of G. At the same time, every other vertex will be taken out because it is not required, so V will contain no other vertex. Hence, the value returned is a k-clique of G. - Runtime: Each vertex of G examined once, and one call to CL for each one, plus linear amount of additional work (removing edges). Total is O((n+1)*t(n,m) + n*(n+m)) where t(n,m) is runtime of CL on graphs with n vertices and m edges; this is polytime if t(n,m) is polytime. - What happens if G contains more than one k-clique? Algorithm will remove vertices from all but the "last" k-clique, depending on the order that vertices are processed. General technique to prove self-reducibility: - assume algorithm to solve decision problem, - write algorithm to solve search problem by making calls to decision problem algorithm (possibly many calls on many different inputs), - argue search algorithm is correct, - make sure that search problem algorithm runs in polytime if decision problem algorithm does -- argue at most polynomially many calls to subroutine are made and at most polytime spent outside those calls. Warning! The argument for self-reducibility is NOT that "any algorithm that solves the decision problem must include a part that solves the search problem", as this is in fact not always true. For example, there is an algorithm that can determine whether or not an integer has any factors (i.e., whether or not it belongs to COMPOSITES) without actually finding any of the factors (through some fairly involved number theory about properties of prime numbers). Example 2: HAMPATH-SEARCH Input: Graph G. Output: A Ham. path in G. - Idea 1: For each vertex in turn, remove it iff resulting graph still contains a Ham. path. - Problem: Every vertex must be in the path anyway, and this does not say where to put each vertex (which edges to use to travel through this vertex). - Idea 2: For each edge in turn, remove it iff resulting graph still contains a Ham. path -- same as for CLIQUE above, except considering edges one-by-one instead of vertices. Example 3: VERTEX-COVER-SEARCH Input: Graph G, integer k. Output: A vertex cover of size k, if one exists (None otherwise). - Idea 1: Remove vertices one-by-one as long as resulting graph still contains a vertex cover of size k. - Problem: If G contains a VC of size k, then G-v (remove v and all incident edges) also contains a VC of size k, whether or not v is in the cover (unless k=n, trivial to solve)! - Idea 2: Check if G-v contains a VC of size (k-1). - Algorithm: VCS(G,k): if not VC(G,k): return None C = {} # the vertices in a vertex cover of G for each vertex v (- V, and while k > 0: if VC(G-v, k-1): C = C u {v}; G = G - v; k = k - 1 return C - Correctness: Loop invariant: G contains a VC of size k. At each iteration, . if G-v contains a VC of size k-1 (C), then G contains a VC of size k that includes v (C u {v}); . if G contains a VC of size k that includes v (C), then G-v contains a VC of size k-1 (C - {v}) -- so by contrapositive: if G-v does not contain a VC of size k-1, then v does not belong to any VC of size k in G). - Runtime: O((n+1)*t(n,m) + n*(n+m)) -- for each vertex v, we perform one call to VC in time t(n,m) and compute G-v in time O(n+m). Optimization problems: - Some search problems with one or more numerical parameters naturally occur in practice in the form of optimization problems, e.g., . MAX-CLIQUE . MIN-VERTEX-COVER . etc. As in the case of vertex cover above, optimization problem can also be self-reducible by using decision problem algorithm to find optimal value of relevant parameter(s). Example: MAX-CLIQUE Idea: Given G, perform binary search in range [1,n] by making calls to CL(G,k) for various values of k, in order to find maximum value k such that G contains a k-clique but no (k+1)-clique. Then, use search algorithm to find k-clique. (Note: Linear search for value of k would also be OK in this case, because search is in the range [1,n] and input size = n.) Similar idea would work for MIN-VERTEX-COVER. ---------------- Space Complexity ---------------- SPACE(s(n)) = { L : L is a language that can be decided by a TM running in worst-case space O(s(n)), i.e., the TM never uses more than O(s(n)) tape cells on any input } NSPACE(s(n)) = { L : L is a language that can be decided by a NTM running in worst-case space O(s(n)), i.e., the NTM never uses more than O(s(n)) tape cells on any input } Fact: If language L is recognized by a TM M running in space O(s(n)), then L is decidable (no matter what s(n) is equal to). Proof: Main idea: The only way that M can loop is by repeating a configuration exactly (because it is limited in how many cells it can use). We can simulate M until it stops or repeats a configuration, thereby deciding L(M). Details: Since M uses <= c*s(n) tape cells on any input, there are at most m^{c*s(n)} possible tape contents that M can enter during its computation (where m = number of symbols in tape alphabet) -- m symbols per cell for c*s(n) cells. For each possible tape content, there are c*s(n) positions that M's head can be in, and k = |Q| different states that M can be in. So M can run through at most k*c*s(n)*m^{c*s(n)} many different configurations before it enters some configuration twice (meaning M is in an infinite loop). Since k, c, m are constants with respect to the input size, this means it is possible to decide L(M) by simulating M and rejecting if M ever runs for more than k*c*s(n)*m^{c*s(n)} = 2^{O(s(n))} steps. This shows SPACE(s(n)) subset of TIME(2^O(s(n)))! Unlike time, space much less affected by details of model (e.g., using k tapes saves time but not space -- information must still be stored). Surprising result: SAT (- SPACE(n) -- keep track of original formula, truth-value assignment to its variables, and simplified formula, all in linear space; simply evaluate formula on each possible truth-value assignment one by one, reusing space. So space seems more "powerful" than time. Intuition: space can be reused; time cannot. Surprising result (Savitch's Theorem): NSPACE(s(n)) subset of SPACE((s(n))^2). Proof idea: NTM running in space O(s(n)) runs in nondeterministic time 2^O(s(n)). Trying out all computation branches takes too much space. Instead, use algorithm to test whether NTM can get from initial configuration to accepting configuration by recursively breaking up computation in two halves -- doing this properly (see textbook) gets space usage down to O((s(n))^2): O(s(n)) for storing configurations, and log(2^O(s(n))) = O(s(n)) for recursion depth. --------- Polyspace --------- PSPACE = U_{k >= 0} SPACE(n^k) = { all languages decided in polyspace } By Savitch's Theorem, NPSPACE = PSPACE. Clearly, P (_ PSPACE and NP (_ NPSPACE, so P (_ NP (_ NPSPACE = PSPACE. What about coNP? coNP (_ coNPSPACE = coPSPACE = PSPACE (deterministic polyspace decider for L yields deterministic polyspace decider for ~L by simply swapping accept/reject). Even more so than with NP, it seems "clear" that P != PSPACE (think of linear-space algorithm for SAT). However, question still open!