Here is the outline for the parallelization research.
I'm particularly interested in finding out the best way to parallelize
branch and bound codes that use
Held and Karp's 1-tree relaxation.
These are of interest because this relaxation is the basis for the
most efficient sequential algorithms known for solving symmetric TSPs.
This is according to Balas and Toth's survey in The Traveling
Salesman Problem, edited by Lawler
Branch and bound algorithms based on 1-trees are also notable because
the search trees they generate are very small. For instance,
Smith and Thompson report that on problems of size 42, 48, 57, and 100 their
algorithm generates 7, 7, 34, and 95 branch
and bound search nodes, respectively.
(Most of the time is spent computing many
minimum 1-trees per branch and bound node.)
Compare this with the thousands of nodes generated
on a problem of size 48 when using the LMSK algorithm reported by
Abdelrahman in 1994.
It therefore seems worthwhile to study the profitability of using many
processors per branch and bound node.
Abdelrahman calls this subproblem-level
parallelism.
I want to gain experience with exploiting parallelism at
this level, especially in using the
1-tree framework where it appears to be most promising.
Notice that any tour of G qualifies as a 1-tree for G.
It follows then that the length of a shortest 1-tree of G is
a lower bound on the length of a shortest tour of G.
A minimum cost 1-tree for G can be found by computing a minimum spanning
tree for (V\{1},E) and by scanning for the two shortest edges incident
upon vertex 1.
There are efficient algorithms for these two steps.
This is the basis for the use of the 1-tree relaxation in branch and
bound codes for the TSP.
Unfortunately, the length of
the minimum 1-tree of a graph is on average only about
60% of the length of the shortest tour of that graph. How can we improve on
this? The trick is to use Lagrange multipliers to incorporate
some degree constraints. That is, we can add a cost to
vertices that have either too many or too few edges.
Instead of computing the minimum 1-tree using the plain cost function
c(i,j), use c(i,j)+l(i)+l(j) where l is some
real-valued function on the vertices. With some rearrangement and
noting that every vertex in a tour is of degree 2, we can show that
for any choice of l,
L(l) = min 1-tree cost (using c(i,j)+l(i)+l(j))-
2 sum_{i \in V} l(i) <= min tour cost. (Sorry, but HTML isn't TeX.)
Given the right choice of l,
this Lagrangean lower bound consistently achieves 99.5% of the minimum
tour cost on symmetric instances, and is the basis for the most successful
algorithms for the symmetric TSP (up to 1985, according to Balas and Toth).
We use what is called a subgradient optimization method to compute the
n-vector l. This is basically a numerical hill-climbing
procedure in n dimensions. Some theory has been developed to
aid in the choice of step sizes, though there are still some parameters
that must be chosen; this is still a black art.
As suggested by Quinn's parallel programming text,
I also implemented Sollin's algorithm.
It suffers from two drawbacks.
First, it is quite complex;
the parallel version would require much fine-grained locking and
synchronization.
Second, it turns out to be much slower
than Prim's algorithm on the (nearly) complete graphs that we encounter.
(See my research log book for timing comparisons.)
[Perhaps insert descriptions of Prim's and Sollin's algorithm.]
I have run experiments to determine the speedup for various
problem sizes with various numbers of processors.
For small problems (e.g. 51 cities), there is no speedup over a
the sequential algorithm. For larger problems, we get some speedup
with the best being 60% of optimal speedup for a 3038 city problem
on up to 10 processors, taking 8.6 seconds.
Instances beginning with rand have edge lengths which
were drawn from a uniform random distribution over [1,1000].
A posting to comp.theory by and email from Jay Herder (jherder at yahoo dot com)
prompted an investigation into grid graphs. He was interested in approximate
solutions on graphs where with embedded grids as part of a wiring problem.
Instance grid25 is a 5 by 5 grid of points spaced 10 units apart.
Arbitrary
insertion found an optimal tour in this case, suggesting that
it is a good heuristic for these kinds of graphs.
All others
were taken from the TSPLIB library of instances.
The compiler was GCC 2.5.8 with no optimization
under Linux 1.0.8.
The machine is an Intel 486 DX2/66 with 256K 20ns cache and 16MB 70ns main
memory.
For comparison, this machine is about as fast as a single processor on a
KSR-1, and about 3 times faster (for this task) than a SPARC IPC.
The program requires little memory: the program and data on instance eil51.tsp
take up about 130 kilobytes on the 486, as reported by top(1)
under Linux. Note that this fits entirely within the 256K cache, even
though I exclusively use the double type for storage and manipulation
of the Lagrangean multipliers and edge lengths.
Note, however, that on Euclidean instances, I do not store
the entire cost matrix but instead recompute the cost function on the fly
from the stored coordinates. This is done to reduce the working set size,
especially for larger instances. I experimented with storing the edge
costs but found that this slowed down
the program by about 5% even on the
relatively small 51 city problem. That is, table lookup was slower than
recomputation.
Held and Karp's 1-tree relaxation
So what is a 1-tree?
Suppose we are given a graph G=(V,E) with n
nodes, i.e.,|V|=n.
Then a 1-tree for G is a spanning tree for (V\{1},E) together with
two edges in G incident upon vertex 1.
Lagrangean cost function
When used as in a branch and bound framework, it pays to have as tight
a lower bounding function as possible.
Pearl has shown that a linear increase in the error of the lower bounding
function leads to an exponential increase in the number of subproblems
generated.
Why use Prim's algorithm?
The graphs that we compute MSTs for are complete or very nearly so.
In such cases, Prim's algorithm is asymptotically optimal because
it scans each edge exactly once and its storage requirements can
be made linear in the number of vertices.
See the MILES_SPAN program in Knuth's excellent
The Stanford GraphBase, especially sections 69 and 70.
Computational experience
Minimum spanning trees
I have tested code for minimum spanning trees using Prim's algorithm
targetted for KSR shared memory parallel machines.
I have modified the algorithm to allow the specification that certain
edges must appear in the 1-tree and that others must not.
These are called fixed in and fixed out edges,
respectively.
Branch and bound code for the TSP itself
I have completed a sequential version of the entire branch and bound algorithm
with the following features:
It solves instances to optimality in the following times:
Number Ascents Task List Len 1-Trees
Instance Cities Performed Avg Std Dev Computed Time (in seconds)
---------- ------ --------- ---- ------- -------- ----
grid25 25 65 16.4 8.6 675 3.7
dantzig42 42 376 67.9 42.5 3635 31.0
hk48 48 13 8.1 3.6 242 2.8
eil51 51 126 92.1 43.6 1706 29.5
st70 70 1854 1294.4 722.8 30412 982.5
eil76 76 373 328.2 189.9 6998 260.5
kroA100 100 9473 2836.1 2004.1 161288 10078.7
eil101 101 1009 291.3 94.7 10028 713.6
lin105 105 26 21.7 11.1 722 54.7
randA50 50 29 3.5 2.6 240 3.0
randB50 50 22 4.3 2.9 257 3.0
randC50 50 23 12.7 7.8 448 5.0
randD50 50 32 6.4 6.0 518 5.9
randA100 100 80 58.7 37.0 2164 89.8
randB100 100 225 172.5 94.6 4987 194.4
randC100 100 36 29.8 17.1 877 33.7
randD100 100 19 16.3 8.6 573 23.9
randA150 150 42 37.1 20.5 1269 134.8
randB150 150 96 76.1 49.0 2872 273.1
randC150 150 46 43.1 24.7 1690 158.1
randB150 150 21 18.7 10.9 726 70.0
randA200 200 26 25.5 14.5 930 159.7
randB200 200 200 184.3 106.7 9157 1540.9
randC200 200 44 36.4 23.2 1962 332.3
randD200 200 30 29.6 16.8 1396 237.3
randA300 300 57 54.3 30.7 2677 1040.9
randB300 300 86 82.5 47.3 6333 2435.4
randC300 300 40 39.2 22.8 1760 699.4
randD300 300 113 107.7 61.1 5611 2162.2
randA1000 1000 90 89.1 51.8 7204 32579.5
Things to do