CLIQUE(G,k): Given the graph G = (V, E) and an integer k, does the graph G have a clique of at least size k? Theorem: CLIQUE is NP-complete. =============== Proof (with allowable fudges on the sizes of input and certificate strings s and t): Claim 1: CLIQUE is in NP =============== Pf Claim 1: CLIQUE(G, k) is clearly a decision problem. Moreover, let G = (V, E) and k be specified by the input string s. *MARKING POLICY*: In this course it is sufficient to ignore the difference between the input string s, and the items that s specifies, in this case (G, k), where G = (V, E). We will take |s| to denote some reasonable measure of the size of (G,k), say |s| = |V| + |E|. For homework and exam problems, you need to say what you take s to specify (typically the input), and also state precisely how you define the size of s. (The text for the course doesn't take care with this.) We will argue that a suitable certificate t for the CLIQUE(G, k) problem is a string t which specifies a subset W of V, for which W forms a clique and has size |W| >= k. *MARKING POLICY*: Again, it is sufficient in this course to fudge the difference between the string t, and the items that t specifies (in this case W). Set |t| to be any reasonable measure of the size of the set that t specifies. For example, here we take |t| = |W|. (For homework and exam problems, you need to state exactly what t specifies, and also what you use for |t|.) With these choices and fudges, |t| <= p(|s|), so the certificate t is of polynomial size in |s| (e.g., we can use the polynomial p(x) = x). We are left with showing that there is a poly-time certifier, that is an algorithm C((G,k), W) such that, for any graph G and integer k: CLIQUE(G,k) is true iff (*) (here * is simply a bookmark) there exists a W such that C((G,k), W) is true. (Here we have omitted the constraints on the size of the certificate t from the definition of NP, which we have argued above are satisfied by W.) Suppose C((G, k), W) is the straight-forward algorithm which checks the following conditions: a) W is a subset of V, b) |W| >= k, and c) for every pair u, v in W, with u not equal v, the edge (u,v) is in E. Here C((G, k), W) returns true iff all these conditions are satisfied. We need to confirm the logical equivalence at the bookmark (*) above. First, consider the implication, CLIQUE(G,k) implies there exists a W s.t. C((G,k), W) is true. Suppose CLIQUE(G,k) is true. Then, by defn of CLIQUE, there must exist a clique of G with size >= k. Let W (a subset of V) be such a clique with |W| >= k. Then, by the definition of clique it follows W must satisfy that each of the conditions in the algorithm C((G, k), W), and therefore C((G, k), W) returns true. We also need to consider the reverse implicaiton, namely that the existence of a W s.t. C((G,k), W) is true implies CLIQUE(G,k). This again follows since the algorithm C((G,k), W) explicitly checks that W satisfies the definition of a clique, and that |W| >= k. Therefore CLIQUE(G,k) must be true. Finally, we need to show that this algorithm C((G, k), W) runs in polynomial time, as a function of the size of the input |s| + |t|. The brute force implementation that was indicated runs in time O(|V| + |V|^2 |E|), which is bounded by O(|s|^3) (since we defined |s| = |V| + |E|, and |t| = |W| <= |V|.) Therefore C((G, k), W) is a poly-time certifier for CLIQUE(G, k). This completes the proof of Claim 1. ==================== At this point we have established that CLIQUE(G,k) is in NP. We next what to show that CLIQUE is NP-complete. To do that we show that there is a polynomial reduction from an problem X that is known to be NP-complete to CLIQUE, i.e., X <=_p CLIQUE. One convenient choice for X is VERTEX-COVER, another is INDEPENDENT_SET. Below we use 3-SAT for X. Claim 2: 3-SAT <=_p CLIQUE (i.e., 3-SAT polynomially reduces to CLIQUE). Since we know 3-SAT is NP-complete, it will follow from Claim 2 that CLIQUE(G, k) is also NP-complete, thereby proving the desired Theorem. ==================== Pf Claim 2: Consider any instance of the 3-SAT problem, say 3-SAT(s). Specifically, let PHI = C_1 And C_2 And ... And C_m be a 3-CNF formula, where each clause C_k is the disjunction of three literals (i.e. x_n or (not x_n)). (Moreover, a variable cannot appear more than once in the same clause. Although we don't use this constraint on PHI here.) We again need to measure the size of the input string s which specifies PHI. In particular, we need to argue that the reduction described below can be done in polynomial time (and with only polynomially many calls to CLIQUE) where these polynomals are in terms of the size of the input |s|. Here we take |s| = m, the number of clauses in PHI. Given PHI define a set of vertices V as follows. For each clause C_k define three vertices v(k,j) for j = 1,2,3. Here the vertex v(k,j) corresponds to the j^th literal (textually from the left) in the k^th clause, i.e., L(k, j) is either x_n or (not x_n) for some n. So V = {v(k,j) | 1<= k <= m, 1<= j < 3}, and |V| = 3m, where m equals the number of clauses in PHI. We define the set of edges E as follows. For every pair k, i of clauses, with k not equal to i, we include the edge (v(k,p), v(i,q)) in E iff v(k,p) and v(i,q) do not refer to a single literal and its negation. That is, if v(k,p) refers to L(k,p) = (not x_n) and v(i,q) refers to L(i,q) = (x_n), or vice versa, for some fixed n, then there is no edge between these vertices. Otherwise we include the edge (v(k,p), v(i,q)) in E. Note there are no edges between vertices associated with the same clause, so (v(k,p), v(k,q)) is not in E, for any k, p, q. Concretely, write out an example of this graph on the board for PHI = (x or (not y) or (not z)) and (not x or (not y) or z ) and (x or y or (not z)) k= 1 2 3 j=1: x not x x j=2: not y not y y j=3: not z z not z Draw edges between all vertices in different columns, except when the endpoints correspond to a literal and it's negation. So (v(k=1,j=1), v(3,1)), (v(1,1), v(2,2)), (v(1,1), v(2,3)) are all in E. But (v(1,1), v(2,1)) and (v(1,2), v(3,2)) are not in E (since the former would link x with (not x), and the later would link (not y) with y. This graph can be constructed in O(m^2) time (note the number of distinct variables is at most 3m). Since we set |s| = m, this is polynomial in the size |s|. With the above graph G = (V, E) constructed in poly-time, we claim that the solution of the original 3-SAT(PHI) problem is given by the result of one call to CLIQUE(G,m). That is, the proof of the reduction will be complete if we can show: Claim 3: 3-SAT(PHI) is true iff CLIQUE(G, m) is true. Note Claim 3 will show that the value of CLIQUE(G, m) is the correct answer for the orginal decision problem 3-SAT(PHI). Therefore the correctness of the polynomial reduction (and Claim 2) follows from the proof of Claim 3 given below. ================= Proof of Claim 3: We first prove that CLIQUE(G, m) implies 3-SAT(PHI). Suppose CLIQUE(G, m) is true, with G = (V,E) as constructed above. Then there must exist a clique W of size >= m (W a subset of V). Since there are no edges between vertices for the same clause in E, and there are m clauses, |W| must equal m. Why? The maximum clique size in G is <= m. To show this, use contradiction. Suppose there is a clique with size > m. Then there must be at least one clause k such that v(k,j) and v(k,i) must be in the clique, with i not equal j. But there is no edge between vertices arising from the same clause, therefore (v(k,j), v(k,i)) is not in E. But this contradicts definition of a clique (every pair of vertices in the clique must be connected by an edge). Moreover, W must contain exactly one vertex from each clause, i.e. for some choices for p_k, k = 1,...,m, we have W = { v(k, p_k) for k = 1, ... m }. Why? Otherwise, since there are exactly m clauses, one of the clauses would have to have more than one vertex in W. But this is impossible as there are no edges between such a pair of vertices and W is a clique. For each such v(k, j_k) in W, we show we can set the corresponding literal to be true (i.e., either x_n or (not x_n)). By the clique property, every pair of vertices in W is connected by an edge. And by construction of the edge set E, such vertices cannot be associated with both x_n and (not x_n). Therefore, the set of all literals associated with vertices in W cannot include both x_n and (not x_n) for any n. Therefore we can consistently set all these literals to true (i.e., either x_n is true and (not x_n) is false, or vice versa). Any other variable x_p that appears in PHI (either in the literal x_p or (not x_p) or both), can simply be set to false. This then defines consistent settings for all the variables x_n in PHI. Moreover, since at least one literal in each clause has been set to true (i.e., the literal associated with v(k,j_k) in clause k), each clause must be true. Therefore, these settings of the variable {x_n} satisfy PHI itself. This completes the argument showing that CLIQUE(G, m) implies 3-SAT(PHI). To complete the proof of Claim 3 we also need to show that 3-SAT(PHI) implies CLIQUE(G, m). Suppose 3-SAT(PHI) is true. Here PHI = C_1 And ... And C_m, and the associated graph G = (V, E) are as defined above. Since 3-SAT(PHI) is true, there exists a satisfying boolean assignment of the all the variables, x_n, n = 1, ..., N, in PHI. Moreover, for each clause C_k, there must be at least one literal that is true. (Otherwise PHI itself would be false.) Select exactly one literal that is true in each clause, and form W from the vertices in G that are associated with these literals. In the construction, the literals appearing at particular positions in PHI are in a 1-1 relationship with vertices in V. (For example, the 2nd literal in the k-th clause is vertex v(k,2) in V.) So this selection of literals, one per clause, uniquely corresponds to exactly m vertices in V. Let W be this set of m vertices. We will show that W is a clique. Suppose a,b are any pair of vertices in W with b not equal to a. It will follow that W is a clique if we can show that there is necessarily an edge (a,b) in E. First notice that, from the way W was constructed, a and b must be associated with different clauses. So a = v(k, p) and b = v(i, q) with i not equal to k. In addition, suppose L(k, p) and L(i, q) are the literals associated with v(k, p) and v(i, q), respectively (i.e. the p-th literal from the left in the k-th clause, and the q-th literal in the i-th clause). Moreover, given the manner in which the literals were chosen, both these literals L(k, p) and L(i, q) must be true in the current boolean assignment. Thus it cannot be the case that literal L(k, p) is simply the negation of L(i, q). That is, one cannot be x_n while the other is (not x_n). Since these literals cannot be negations of each other, and we have established that they appear in different clauses (i.e., k is not equal to i), it follows from the construction of E that (a,b) is a edge in E. Therefore we have shown that an arbitrary pair of points a, b in W are connected by an edge in E. Therefore W is a clique. We showed above that |W| = m, so therefore CLIQUE(G, m) must be true. This proves 3-SAT(PHI) implies CLIQUE(G,m), and completes the proof of Claim 3.