Delayed Internet Routing Convergence

From: <nadeem.abji_at_utoronto.ca>
Date: Wed, 11 Oct 2006 02:07:31 -0400

Paper Review: Delayed Internet Routing Convergence

The growth in size and complexity of the Internet places strain on the
underlying infrastructure. The paper investigates latency in Internet
path failure, failover and repair due to the convergence properties of
inter-domain routing. The major contributor to this delay is caused
by the slow convergence of routing algorithms.

The Border Gateway Protocol (BGP), the inter-domain routing protocol
used in the Internet, uses a path vector with the entire path to the
destination. The path vector is incorrectly believed to provide
significantly improved convergence properties over traditional routing
protocols. The paper shows that the upper bound on complexity for BGP
convergence is factorial with respect to the number of autonomous
systems. The paper investigates the rate at which the inter-domain
repair and failure information propagates through the Internet.
Furthermore, they measure the impact of path changes on end-to-end
network performance.

In their experiment, they collected data over the course of two years
while injecting over 250,000 routing faults in 5 major ISPs. They
also built a model of the delayed BGP convergence process to provide
an upper and lower theoretical computational bound. The theoretical
upper bound is said to be unlikely to occur in practice as it is based
on a set of specific assumptions and worst-case scenarios. The
theoretical lower bound is closer to practise since timers are used to
handle synchronization issues not accounted for in the upper bound.
Finally they related their experimental findings to their theoretical
model.

Some key results of the study:
- Average delay in inter-domain path failovers was three minutes
- Theoretical upper bound on BGP convergence: O(n!) with n autonomous systems
- Lower bound on BGP convergence: Omega((n-3)*30) with n autonomous systems
- Delay of inter-domain route convergence almost entirely attributable
to unforeseen interaction of protocol timers.
- Path failover results in packet loss growth factor of 30 and latency
increase factor of 4 on end-to-end performance
- Changes to vendor BGP implementations would reduce lower bound on
inter-domain convergence to Omega(30)

One observation was that the convergence time varied across autonomous
systems and therefore the time is directly related to topological
factors.

One question that arises is if normally occurring Internet failures
occur on the average of once a month, do the delays caused by these
rare failures have a noticeable effect? Since their solution is
relatively straightforward and simple, it is worth making
modifications to reduce delays even if the failures are rare. The
effect of these failures on end-systems and applications is unclear.
Is the situation as dire as they make it out to be? Furthermore, the
delay is proportional to the number of autonomous systems. Currently,
at what rate is the number of autonomous systems increasing in the
Internet? If these questions are answered, there will be a better
indication of whether or not BGP needs a vast improvement, or perhaps
even an overhaul.

The paper is generally well-written. I find that it is organized well
and easy to follow. The use of both experiments and analytical
modelling provide support for their claims. The paper addresses the
issue of the effect of failures on inter-domain routing. This problem
is challenging since the Internet is a system of smaller systems
(autonomous systems) and thus routing between these systems in the
face of failure is not a trivial task. Also, the use of analytical
modelling for the Internet should only be considered a first step.
The authors of this paper took the next step of collecting real data
from the Internet (albeit based on their own failure injections) over
a time period of two years lending validity to their claims.

-- Nadeem Abji
Received on Wed Oct 11 2006 - 02:07:48 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 11 2006 - 13:33:19 EDT