Summary: Delayed Internet Routing Convergence

From: Andrew Miklas <agmiklas_at_cs.toronto.edu>
Date: Thu, 12 Oct 2006 01:48:05 -0400

This is paper describes the convergence efficiency of BGP. It analyzes the
time required for convergence, the number of messages exchanged when
converging, and the effects on network traffic until all routers have fully
converged on a new routing configuration. The paper discusses a measurement
study, simulation results, and also provides a light theoretical analysis.
They use these studies in order to derive a BGP convergence model.

The measurement component was done by injecting new routes for an IP range at
a major peering centre, and measuring how the new routing information
propagated through the border routers of large ISPs. They were also able to
analyze the effects of a route failover on the end-hosts by having machines
within the affected IP range run pings against a collection of sites (with
different ISPs) as the new routes were injected at the peering centre. This
allowed them to study how end-user visible network properties such as packet
loss and latency varied as the new routing information was propagated
throughout the border routers of the world's ISPs.

The model the authors create makes a number of simplifying assumptions. For
instance, they assume that the AS-graph is complete. They also assume that
the network connections between AS's cause BGP to operate in a "serialized"
fashion. Justification for both assumptions is provided, although I'm not
sure if they are reasonable. They analyze the performance of BGP under both
best-case and worst-case configurations.

Finally, they use their model to propose an adjustment to the loop-detection
process performed by BGP. It appears that they've created a BGP
implementation that will search for loops prior to sending out route
announcement messages. By catching the loops at the sender, rounds of BGP
updates can be avoiding and the convergence time improved. The study of
this improvement seemed almost to be an afterthought when compared with the
effort put into the rest of their modelling and measurement. Unfortunately,
without this improvement, the paper becomes strictly a "measurement and
analysis" work, and doesn't seem to offer any solutions.

Some interesting conclusions:

- Multihomed failovers average three minutes, and can trigger oscillations
that take 15 min to resolve. Compared to failure handing on the telephone
network, this is extremely large.

- Convergence delays will grow linearly with the addition of AS's in the best
case, and exponentially in the worst.

The paper ends by pointing out that in order to get BGP convergence
performance to be comparable to the failover times seen with the telephone
network, the complexity of the protocol would need to be greatly increased.
However, they explain that the growth and success of the Internet is rooted
in the use of simple protocols. This creates a conflict that the paper is
unfortunately not able to resolve.
Received on Thu Oct 12 2006 - 01:48:24 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 12 2006 - 07:26:38 EDT