===========================================================================
CSC 263                 Lecture Summary for Week 11               Fall 2007
===========================================================================

[[Q:  denotes a question that you should think about and
      that will be answered during lecture.  ]]

---------------------
Minimum Spanning Tree [ chapter 23 ]
---------------------
  Input: connected undirected graph G=(V,E) with positive cost c(e) > 0
        for each edge e in E.
  Output: a spanning tree T subset of E such that cost(T) (sum of the
        costs of edges in T) is minimum.

  - Terminology:
      . "Spanning tree": acyclic connected subset of edges.
      . "Acyclic": does not contain any cycle.
      . "Connected": contains a path between any two vertices.

  [Note: the terms "cost" and "weight" mean the same thing,
  as do c(e) and w(e) for cost or weight of an edge.
  Different people use different terms, depending on the application.]

Let's look at some algorithms for solving the MST problem:

 A. Brute force: consider each possible subset of edges.
    Runtime?  Exponential, even if we limit search to spanning trees of G.

 B. Generalized MST algorithm:
	General greedy approach: build a spanning tree edge by edge,
	    including appropriate "small" edges and
	    excluding appropriate "large" edges.
        We can think of these algorithms as an edge-colouring process.
	    - initially, all edges of the graph are uncoloured
	    - one at a time colour edges either blue (accepted)
	        or red (rejected) to maintain a "colour invariant"

	Colour Invariant: there is a MST containing
	    all the blue edges and none of the red edges

	If we maintain this colour invariant and colour all the edges
	of the graph, the blue edges will form a MST!

        Terminology:
	  . "cut": a vertex partition (X, V-X)
	  . edge e "crosses" a cut if one end is in each side

	Rules for colouring edges:
	  . Blue Rule:
	        Select a cut that no blue edges cross.
	        Among the uncoloured edges crossing the cut,
		select one of minimum cost and colour it blue.
	  . Red Rule:
	        Select a simple cycle containing no red edges.
	  	Among the uncoloured edges in the cycle,
		select one of maximum cost and colour it red.
	Note the nondeterminism here: we can apply the rules at any time
	and in any order.

    Correctness?  What do we have to prove?
        Theorem: All the edges of a connected graph are coloured and
	the colour invariant is maintained in any application of a rule.

    To prove: The colour invariant is maintained.
        By induction on number of edges coloured.

	Initially, no edges are coloured, so any MST satisfies CI.

	Suppose CI true before blue rule is applied, colouring edge e blue.
	    Let T be a MST that satisfies CI before e is coloured.
	    If e in T, T still satisfies CI, done.
	    If e not in T, consider the cut (X, V-X) used in the blue rule.
	        There is a path in T joining the ends of e, and at least
		one edge e' on this path crossses the cut.
		By CI, no edge of T is red, and with blue rule,
		e' is uncoloured and c(e') >= c(e).
		Thus T - {e'} + {e} is a MST and it satisfies CI after
		e is coloured.

	Now suppose CI true before red rule is applied, colouring edge e red.
	    Let T be a MST that satisfies CI before e coloured.
	    If e not in T, T still satisfies CI, done.
	    If e in T, deleting e from T divides T in 2 trees T_1 and T_2
	        partitioning G (thus (T_1,T_2) is a cut).
	        Consider the cycle including e used in the red rule.
		This cycle must have another edge e' crossing cut (T_1,T_2).
		Since e' not in T, by CI and red rule, e' is uncoloured
		and c(e') <= c(e).
		Thus T - {e} + {e'} is a MST and it satisfies CI after
		e is coloured.

    To prove: All edges in the graph are coloured?
        Suppose this method "stops early"
	(i.e., there is an uncoloured edge e but no rule can be applied)
	By CI, blue edges forms a forest of blue trees (some trees might
	just be isolated vertices).
	If both ends of e are in the same blue tree,
	    the red rule applies to the cycle that would be formed
	    by adding e, contradiction.
	If the ends of e are in different blue trees, say T_1 and T_2,
	    the blue rule applies to the cut (T_1, V-T_1),
	    contradiction.
	Thus if any uncoloured edge remains, some rule must be applicable.


 C. Kruskal's algorithm (1956):
        // let m = |E| (# edges) and n = |V| (# vertices)
        sort edges by cost, i.e., c(e_1) <= c(e_2) <= ... <= c(e_m)
        T := {} // partial spanning tree
        for each v in V: MakeSet(v) // initialize disjoint sets
        for i := 1 to m:
            let (u,v) := e_i
            if FindSet(u) != FindSet(v): // u,v not already connected
                T := T U {e_i}
                Union(u,v)
        return T
    Runtime?  Theta(m log m) for sorting; main loop involves sequence of m
        Union and FindSet operations on n elements which is Theta(m log n).
        Total is Theta(m log n) since log m is Theta(log n).

 D. Prim's algorithm (Jarnik 1930, Prim 1957, Dijkstra 1959):
        Idea: start with some vertex s in V (pick arbitrarily) and at each
        step, add lowest-cost edge that connects a new vertex.
        -> More in tutorial.

 E. Boruvka's algorithm (1926):
    Idea: do steps like Prim's algorithm in parallel
        initially n trees (the individual vertices)
	repeat
	    for every tree T, select a minimum-cost edge incident to T
	    add all selected edges to the MST (causing trees to merge)
	until only one tree 
	return this tree T
    Runtime?  Analysis similar to merge sort. Each pass reduces number of
        trees by factor of two, so O(log n) passes. Each pass takes O(m)
	time, so total is O(m log n)
    Correctness?  Special case of red-blue algorithm.