=========================================================================== CSC 373H1 / L0101 Tutorial Notes -- Week 2 Winter 2012 =========================================================================== Problem: Suppose a small company has only one sales person. There are n different business customers. Each customer wants to have O(n) meetings with the sales person. For example, one customer would like to meet at 9-10am Mon, Jan 30, 4-5pm Wed Feb 1, 5-6pm Fri Feb 10. We wish to find a maximal set of customers without introducing double-bookings for the sales person. That is, we wish to make k, the number of customers, as big as possible. ====== Make sure the problem is clear. Draw a picture: Meeting times Tk for customer Ck is a set of time intervals. E.g. 3 customers, C = {C1, C2, C3}, let T = {Tj, j = 1,2,3} + denotes a meeting time, - denotes no meeting. T1 ---------++++---------+++++------- T2 ------+++++-------++-------------- T3 --++++++--------+++++------+++++++ Soln for k = 2. Assign customers C1 and C3 to the sales-person. In general, we will be given a set of customers C = {C_k, k=1, ... n} and, for each customer, meeting times T_j. We will set T = {T_j, j=1, ... n}. Any questions on the problem statement? The above is a search problem. ============ Decision problem: SALES(C, T, k): Does there exist a selection A of customers (i.e., A is a subset C) with |A| >= k such that the sales-person is never double booked. That is, for every pair of distinct customers a, b in A, then T_a intersect T_b is empty. =========== ASK: Is this problem in NP? What do we need to show? Let the string s specify a SALES instance problem, (C, T, k). We define |s| to be a measure of the size of (C, T, k). Since T can contain O(n^2) intervals, we take |s| = n^2. Let the string t specify a set of customers A contained in C. We define the size of t to be |t| = n (since |A| <= |C| = n). Certifier cert(s,t): The algorithm cert(s,t) checks whether the subset A specified by t is a subset of C of size k and that, for every pair of customers a and b in A, with a not equal to b, the times T_a and T_b do not intersect. If so, it returns true, otherwise it returns false. Since each T_c for c in C can contain O(n) time intervals, this certifier runs in time O(|A|^2 n^2) = O(n^4), where n = |C|. Therefore cert(s,t) is poly-time. Moreover, by the definition of s and t, it follows that there exists a certifier string t with |t| <= |s| and cert(s,t) evaluating to true iff SALES(C, T, k) is true. Therefore SALES is in NP. ========== Is SALES NP-complete? Try a reduction involving INDEP_SET (i.e., independent set). Remind them of independent set. G=(V,E). S subset V is an independent set iff for each e = (u,v) in E at most one end point u or v is in S. ASK: Do we show a) SALES <=p INDEP_SET OR b) INDEP_SET <=p SALES ?? ======= Show INDEP_SET poly-reduces to SALES(C, T, k). ASK: How to begin? A general instance of SALES or a general instance of INDEP_SET? ======= Polynomial Reduction: Given a general instance INDEP_SET, consisting of an undirected graph G = (V, E), and a non-negative integer k. Let n = |V|. (Note, |E| is O(n^2), and the number of edges incident to any vertex v in V is <= n-1.) We wish to construct a SALES problem which helps us solve this INDEP_SET problem. ========= Details of Poly Reduction: Given G = (V,E) and k, the input for INDEP_SET. We use the same k for a SALES problem we will construct. Associate each vertex v in V with a unique customer, say c_v. Define the set of customers to be C = {c_v | v in V}. Associate each edge e = (u, v) in E with a *distinct* non-overlapping time interval, s_e, in which a sales meeting is to be held. Customers c_u and customer c_v wish to schedule a meeting in the time interval s_e. (No other customer c_w wishes to have a meeting during s_e.) Set T_v to be the time intervals that customer c_v requests. That is, T_v = {s_e | e in E and v is an endpoint of e}. And set T = { T_v | v in V }. This construction of C, T, and k can be done in poly-time, say O(p(|V|)) for some polynomial p. Note the customers and timeslots are in 1-1 relation to the vertices and edges in the graph, respectively. Therefore we can refer to the unique customer c_v for v in V and the unique time slot s_e for each e in E. Finally note that the maximum number of time intervals per customer is equal to the maximum number of edges ending at a particular vertex v (i.e., the degree of the vertex), which is bounded by |V| - 1 = O(n). This agrees with the constraint in the statement of the SALES problem. Therefore we can construct the instance SALES(C, T, k) in polynomial time. Claim 1: For any graph G = (V, E), assume C = {c_v | v in V} and T = {T_v | v in V} are the customers and meeting times that are defined by the above construction. Then it follows that INDEP_SET(G, k) iff SALES(C, T, k). SKIP the proof of this in the tutorial (it is included below). Given Claim 1, the solution of INDEP_SET(G,k) is given by the decision problem SALES(V,T,k). This completes the polynomial reduction and the proof that INDEP_SET(G,k) <=p SALES(V,T,k) ============== NOTE: The correctness of this reduction depends on being able to reduce any independent set problem to a sales problem. In particular, any graph G = (V, E) must be reducable. If we had a constraint on the number of meetings per customer (e.g. <=5), or the total number of meetings (e.g., <= 40 hours, if all the meetings were to occur in one work week), then this reduction is NOT correct. (General graphs G=(V,E) can have more than 5 edges per vertex, and more than 40 edges.) In fact, later in this course we will show that if each customer has only one meeting time (i.e. one time interval), then the resulting problem is in P. ========= CAN OMIT: Proof of Claim 1: Let G = (V,E) and suppose C, T are constructed as described above. That is, C = {c_v | v in V} is the constructed set of customers, T = {T_v | v in V} denote the scheduled times, and where the individual meeting times for customer c_v are T_v = {s_e | e in E and v an endpoint of e}. Here s_e is a time interval and, for any e,f in E with e not equal to f, s_e does not intersect s_f. First suppose INDEP_SET(G, k) is true. We wish to show SALES(C,T,k) must be true. Since INDEP_SET(G, k) is assumed true, there exists an independent set subset S of V such that |S| >= k. We claim that A def= {c_v | v in S} is a set of >=k customers as required by SALES(C,T,k). We show this latter claim by contradiction. Suppose there exists a time slot s_e for which at least two customers c_u, c_v in A are booked. By the 1-1 relationship between edges e in E and time intervals s_e, then e=(u,v) is in E with endpoints u and v in S. But this contradicts S being an independent set on G. Therefore the sales-person cannot be double-booked. Next we suppose SALES(C, T, k) is true. We wish to show INDEP_SET(G, k) is true. Suppose SALES(C, T, k) is true. Then there must exist a set of >= k customers, say A, such that they can all be scheduled without double booking. By the 1-1 relationship between customers and vertices, we can define set S = {v | c_v in A}. Moreover |S| = |A| = k. We claim S is an independent set on G. We show this latter claim by contradiction. Suppose there exists an edge e=(u,v) in E with both u and v in S. By the construction, this edge relates to an unique time interval s_e. Moreover, both customers c_u and c_v wish to schedule a meeting during s_e. But u,v in S implies c_u and c_v are in A. But this contradicts the lack of double booking for customers in A. Therefore S must be an independent set. Therefore we have shown that INDEP_SET(G, k) iff SALES(C, T, k).