(no subject)

From: Jin Jin <jinjin_at_eecg.toronto.edu>
Date: Wed, 11 Oct 2006 13:32:56 -0400

This paper examines the latency in Internet path failure, fail-over
and repair due to the convergence properties of inter-domain routing.

As the national and economic infrastructure become increasingly
dependent on the global Internet, the availability and scalability of
IP-based networks will emerge as among the most significant problems
facing the continued evolution of the Internet. The paper shows that
inter-domain routers in the packet switched Internet may take tens of
minutes to reach a consistent view of network topology after a fault.
These delays stem from temporary routing table oscillations. End-to-
end Internet path will experience intermittent loss of connectivity,
as well as increased packet loss and latency.

This paper has argued that the lack of inter-domain fail-over due to
delayed BGP routing convergence will potentially become one of the
key factors contributing to the "gap" between the needs and
expectations of today's data networks. In the paper, authors describe
several unexpected properties of convergence and show that the
measured upper bound on Internet inter-domain routing convergence
delay is an order of magnitude slower than previously thought. They
demonstrate that multi-homed fail-over now averages three minutes,
and may trigger oscillations lasting as long as fifteen minutes. The
paper also shows that these delays will grow linearly with the
addition of new autonomous systems to the Internet in the best case,
and exponentially in the worst. Further, the analysis also shows that
the upper theoretic computational bound on the number of router
states and control messages exchanged during the process of BGP
convergence is factorial with respect to the number of autonomous
systems in the Internet.

This paper also suggested specific change to vendor BGP
implementations which, if deployed, would significantly improve
Internet convergence latencies. But BGP path changes will still
trigger temporary oscillations and require many seconds longer than
the current PSTN restoral times. They also improve BGP convergence
through the addition of synchronization, diffusing updates and
additional state information, but all of these changes to BGP come at
the expense of a more complex protocol and increased router overhead.
The implications of trade -off between the scalability of wide-area
routing protocols and the growing need for fault-tolerance in the
Internet is an active area of the current research.

The paper is well written, with fine and clear presentation. It seems
very solid. The main contribution is that it describes some
unexpected properties of BGP convergence, and demonstrates that much
of the observed convergence delay stems from specific router vendor
implementation decisions and ambiguity in the BGP specification.
Authors use large amount of data analysis and simulation to
demonstrate the point. However, obviously, it is more like a
measurement without originality. Although they propose some
improvement, the paper still does not solve the problem.
Received on Wed Oct 11 2006 - 13:33:16 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 11 2006 - 18:07:16 EDT