Resilient Overlay Networks

From: <nadeem.abji_at_utoronto.ca>
Date: Fri, 13 Oct 2006 03:00:00 -0400

Paper Review: Resilient Overlay Networks

The paper proposes the design and implementation of a Resilient
Overlay Network (RON), an application-layer overlay attempting to
improve routing by supplementing wide-area routing protocols. RON
nodes monitor Internet paths and decide whether to deliver packets
directly over the Internet or through other RON nodes optimizing
specific routing metrics. The BGP protocol provides wide-area
scalability at the cost of weak fault-tolerance. BGP ends up hiding
topological details due to policy enforcement. RON attempts to
improve on the performance provided by the BGP inter-domain routes.
The nodes cooperate to forward data on behalf of any pair of
communicating nodes in the overlay. They are able to quickly detect
problems by aggressively probing the network and immediately route
around any failures through the overlay.

The nodes in the overlay exchange information and build routing tables
based on any path metrics including latency, loss rate and throughput.
  Metrics are collected through both active probing and passive
observations of transfers. Due to its aggressive nature, the size of
RON should be limited, suggested to 50 nodes maximum, to prevent
excessive bandwidth overhead. The design of RON can be described with
three fundamental goals:

1) Failure detection and recovery in less than 20 seconds
2) Tighter integration of routing and path selection with the application
3) Expressive policy routing

Although RON supports routing through multiple nodes, in most
situations routing through a single intermediary node will suffice.
The RON design has one clever feature, routing information messages
are sent via the RON forwarding mesh to ensure correct information at
each router even with failures.

In their experiments, they used a real-world deployment of RON at
several Internet sites. Their experimentation showed that RON
successfully routed around outages at varying packet-loss levels.
Also, the overhead of maintaining the overlay was said to be about 10%
of the bandwidth assuming ?today?s broadband Internet links?. 10% is
a reasonable value considering the quick recovery in the face of
failures, outages and degradation. Experiments also showed that the
RON paths were relatively stable while maintaining desirable response
times.

The paper claims to be the first wide-area network overlay system with
the ability to recover from outages within seconds. The paper
deserves credit for taking a novel, yet feasible, approach to solving
the failure/failover convergence problem of BGP. It is difficult to
imagine that multiple end-to-end hop routing can provide better
performance than directly using underlying architecture but that is
the harsh reality of the Internet and the trade-off BGP makes for
scalability. Another benefit of using RON is that it enables users to
have more control over routing policies. Thus a user can specify a
metric to optimize for in order to support a specific application
which may not have been possible without overlay routing. One
question that comes to mind is what kind of stress does the routing
function put on end-systems? Is there a noticeable lag on RON clients
which get chosen as intermediary nodes often?

Although the paper does seem to raise several privacy, trust, and
fairness issues, from a technological standpoint it makes a very
significant contribution. The paper is succinct while providing
enough relevant details to deliver the full meaning of their idea.
Many papers give very high-level treatments and thus make it difficult
to judge the real-world implications of the proposed systems. This
paper provides enough information to give the reader a clear
understanding of how the system works in theory and practice.

-- Nadeem Abji
Received on Fri Oct 13 2006 - 03:00:27 EDT

This archive was generated by hypermail 2.2.0 : Sat Oct 14 2006 - 16:36:23 EDT