=========================================================================== CSC 363 Lecture Summary for Week 8 Fall 2009 =========================================================================== More examples of languages in NP. - CLIQUE = { : G is an undirected graph that contains a k-clique -- a subset of k vertices with all edges between them } a---b For example, the graph pictured on the left contains a |\ /| 3-clique (there are sets of 3 vertices with all edges | c | between them, e.g., {a,b,c}), but it does not contain a |/ \| 4-clique (every set of 4 vertices is missing at least one d---e edge, e.g., {a,b,c,d} is missing (b,d)). CLIQUE (- P? Unknown (checking all possible subsets not polytime because k not fixed, part of input). CLIQUE (- NP? Yes: Verifier = "On input : 1. Check c encodes set of k vertices. 2. Check every vertex in c belongs to G. 3. Check G contains edges between all pairs of vertices in c. Accept if all checks pass, reject otherwise." Verifier runs in polytime (where n = |V|, m = |E|): 1. checking encodings can be done in polytime, 2. time O(kn), where n = number of vertices of G, 3. time O(k^2 m) (O(k^2) pairs in c, time O(m) for each one). If (- CLIQUE, then verifier accepts when c = a k-clique of G; if verifier accepts for some c, then (- CLIQUE (c is a k-clique). - Contrast CLIQUE with TRIANGLE = { : G contains a triangle }: TRIANGLE (- NP: On input , check c encodes a triangle in G. But TRIANGLE (- P (as shown in tutorial). What's the difference? Same argument with CLIQUE means time O(n^{k+1}) to decide, except that k is part of the input (instead of being fixed) so this could be as bad as, e.g., O(n^{n/2}) -- definitely not polytime. Notes: - ~HAMPATH, ~CLIQUE, ~SUBSET-SUM (complements) don't appear to belong to NP: apparently, no way to give a short certificate of NON-membership in HAMPATH, CLIQUE, or SUBSET-SUM (certificate would have to include information about all paths that are *not* Hamiltonian, or all subsets of vertices that are *not* cliques, or ..., and as there could be exponentially many of these, would require exponential time to check). - On the other hand, ~COMPOSITES = PRIMES can be shown to belong to NP (using number theory). In fact, recent research result (Agrawal, Kayal, Saxena 2002) showed that PRIMES is actually in P (for more details, see http://crypto.cs.mcgill.ca/~stiglic/PRIMES_P_FAQ.html). Definition: coNP = { ~L : L (- NP } = { complements of languages in NP }. Note: coNP != ~NP! L (- coNP iff ~L (- NP but L (- ~NP iff L !(- NP (for example, A_TM (- ~NP but A_TM !(- coNP). [Picture: P (_ (NP n coNP) (_ DECIDABLE; analogy with computability: DECIDABLE = RECOGNIZABLE n coRECOGNIZABLE.] Open question: P ?= NP n coNP (No strong consensus.) Open question: NP ?= coNP (Strongly believed to be NO.) Open question: P ?= NP (Strongly believed to be NO.) Answering these questions is worth 1 million dollars! (They are some of the "Millennium Problems" recognized by the Clay Mathematics Institute.) --------------------- Polytime reducibility --------------------- Defn: Language A is "polytime reducible" to language B (written A <=p B) if there is a polytime computable function f : \Sigma* -> \Sigma* such that for all w (- \Sigma*, w (- A iff f(w) (- B. Almost identical to many-one reducibility, with added constraint that f can be computed in polytime. In fact, most (if not all) reductions we've seen so far have been polytime. Just like <=m, think of "<=p" as comparing the difficulty of deciding the languages. So A <=p B intuitively says "A is no more difficult to solve than B" or equivalently, "B is at least as hard to solve as A". Theorem: A <=p B and B (- P (or NP) implies A (- P (or NP). Main proof idea: On input x (or ), compute f(x), in polytime, then run decider for B on f(x) (or verifier for B on ), in polytime. Corollary: A <=p B and A !(- P (or NP) implies B !(- P (or NP). Just like for decidability/recognizability, one example of a language not in P could be used to prove more, using <=p. Problem: only known examples of such languages are outside NP... Want to focus on NP because it contains a vast majority of problems from "real life" applications. Idea: try to identify "hardest" problems in NP. Defn: Language A is "NP-complete" (NPc) if - A (- NP, - B <=p A for all B (- NP (A is "NP-hard"). Notes: - NP-hardness (the condition that B <=p A for all B (- NP) is different from A (- NP: it's possible for a language to be NP-hard *without* belonging to NP -- for instance, A_TM is NP-hard but not in NP. - This is a very "ambitious" definition: the condition of NP-hardness requires us to prove that B <=p A for ALL B (- NP, and this is a strong requirement -- it's not clear that there's any language A for which it's possible to do this. - However, the definition is useful, as shown in the next theorem. Theorem: If A is NPc, then A (- P iff P = NP. Proof: - If P = NP, then A NPc -> A (- NP -> A (- P. - If A (- P, then A NPc -> for all B (- NP, B <=p A -> B (- P (because A (- P), i.e., NP (_ P so P = NP. Corollary: If P != NP and A is NPc, then A !(- P. So proving NP-completeness "as good as" proving not in P. --------------- NP-completeness --------------- - SAT = { : F is a propositional formula that is satisfiable, i.e., there is some assignment of truth-values to the variables of F that makes F true } For example, (p /\ q) -> (p \/ r) is satisfiable, but p /\ ~p is not. - Cook-Levin Theorem: SAT is NPc SAT (- NP: ("short form", which we will use from now on) It takes polytime to verify that a certificate encodes an assignment of values to the variables of F that makes it true. SAT is NP-hard: (high-level idea only) For an arbitrary language A (- NP, by definition, there is some NTM M_A that decides A in time <= C n^k for some constants C, k. Given input x, construct formula F_x that describes possible computation paths (of length at most C |x|^k) of M_A on x, such that F_x is satisfiable iff there is some computation path of M_A that accepts x. Intuitively, we are simulating the TM model of computation using propositional formulas, which are similar to digital circuits. Details are in the textbook and are needed to ensure that this can be done in polytime. In general, to prove A NP-hard, it suffices to show B <=p A for some NP-hard B: if B <=p A then for all L (- NP, L <=p B (by definition of NP-hardness for B) so L <=p A (since <=p is transitive, something you can prove as an easy exercise). Template for proofs of NP-completeness: To show A is NPc, prove that A (- NP: Describe a polytime verifier for A: "Given , check that c..." Argue that verifier runs in polytime and that x (- A iff verifier accepts for some c. Note that all languages in NP we've seen so far have a similar structure to their definition: "the set of objects A for which there is some related object B such that some property holds about A and B" -- for example, CLIQUE = "the set of undirected graphs G and integers k for which there is a subset of vertices C such that C makes up a k-clique in G". For all such languages, the verifier will also have a common structure: "on input , check that c encodes an object B and that A and B have the required property". Because of the way languages are defined, this guarantees is accepted iff belongs to the language. All that remains is to ensure checking property of A,B can be done in polytime. A is NP-hard: Show B <=p A for some NP-hard B: "Given y, construct x_y as follows: ..." Argue that construction can be carried out in polytime and that y (- B iff x_y (- A. In more detail, this involves: . starting with arbitrary input y for B (i.e., without making any assumption about whether y (- B or y !(- B), . describing explicit construction of specific input x_y for A, . arguing construction can be carried out in polytime, . arguing if y (- B, then x_y (- A, . arguing if x_y (- A, then y (- B (of equivalently, if y !(- B, then x_y !(- A). Watch last step! Argument starts from x_y constructed earlier (not from arbitrary input x for A), and relates it to arbitrary y that x_y was constructed from. Additional examples of NP-completeness [not covered in lecture; to be done in tutorial]: - Definition: a propositional formula is in Conjunctive Normal Form (CNF) if it is written as a conjunction ("and") of one or more "clauses": C_1 /\ C_2 /\ ... /\ C_r where each clause is a disjunction ("or") of one or more "literals": C_j = (a_{j,1} \/ a_{j,2} \/ ... \/ a_{j,s_j}) where each literal is either a propositional variable or the negation of a propositional variable. For example, (p \/ ~q) /\ q is in CNF, but ~(p /\ q) /\ (r \/ ~p) is NOT in CNF (it is equivalent to some CNF formula, but the property of being in CNF is all about the way the formula is written, *not* what it is equivalent to). The proof of the Cook-Levin Theorem actually shows that CNF-SAT is NPc (where CNF-SAT = { : F is a formula in CNF that is satisfiable }), because it's possible to construct the formula F_x to be in CNF. - Definition: a propositional formula is in 3-CNF if it is in CNF where each clause contains exactly 3 literals, e.g., (p \/ q \/ ~r) /\ (p \/ p \/ ~q) 3SAT = { : F is a formula in 3-CNF that is satisfiable } is NPc: 3SAT (- NP because it's a special case of SAT (same verifier works). 3SAT is NP-hard because CNF-SAT <=p 3SAT: reduction written up and posted along with summary of lecture notes. Note: Careful with directions! Trivially, 3SAT <=p CNF-SAT (3SAT is special case of CNF-SAT). But we need other direction to conclude 3SAT is NP-hard, transforming instances of general problem into instances of restricted problem (trickier).