RON Review from Vladan D on 2006-10-16 (mbox)

From: Vladan D <vladandjeric_at_gmail.com>
Date: Mon, 16 Oct 2006 00:21:25 -0400

This paper presents Resilient Overlay Networks, a type of overlay network
designed to overcome the slowness of BGP's fault-recovery mechanisms as well
as other shortcomings. BGP hides topological details in the interests of
wide-area scalability and policy enforcement, has little information about
traffic conditions, and damps routing updates to prevent oscillations. This
can lead to delays on the order of tens of minutes before routes converge to
a consistent form.

The design of RON has 3 main goals:

1) The main goal of RON is to enable a group of nodes to communicate despite
problems in the communication links joining them
2) The second goal is to allow distributed applications more access to
routing and path selections, for example by choosing appropriate metrics and
determining what is considered a fault
3) The third goal is to allow for the creation of expressive routing
policies which determine choice of paths in the network

RON's basic assumption is that RON nodes can find paths between themselves
even when regular Internet protocols cannot, thanks to underlying physical
path redundancy. In fact, this paper shows that in most cases, it is
possible to route "around" a failure with only one intermediate hop across a
RON node. RON is able to detect problems quicker by using active probing
and passive observations of ongoing transfers to determine key metrics such
as latency, packet loss rate, and available throughput. This information is
exchanged frequently among RON nodes which are interconnected in a full
mesh. The routing information itself is transferred over the RON networks.
When a RON node detects that the underlying Internet path is not the best
one, the RON forwards packets through other RON nodes. RON headers are
tagged with numbers that identify its membership in a flow. This allows RON
nodes to channel all the packets of a flow across the same nodes (thereby
simplifying routing decisions) and it also allows applications to determine
what constitutes a flow. Additionally, when a packet enters the RON, it is
classified and given a policy tag used to determine which of the routing
tables should be used to select its path. The policies can be used to
specify, for example, whether commercial ISP packets are allowed to travel
over Internet2 links. When a packet arrives at a RON node, its policy tag
is inspected to determine which routing preference table to use, then the
preference flags are checked (latency, loss, etc) to determine the next
table, and finally it determines the next-hop from a table based on the
packet destination. Membership in a RON can be static or dynamic.

To evaluate the design, a RON was deployed on the internet, implemented as a
resilient IP forwarder. RON was able to route around between 60% and 100%
of all significant outages. On average, it takes 18 seconds to detect and
route around the failure, even in the face of a DoS attack on the same
path. It performed well even with performance failures: the loss
probability decreased by 5% in 5% of samples, latency reduced by 40ms in 11%
of the samples, and TCP throughput doubled in 5% of all samples. The
overhead of the forwarder was 220 us. This paper is interesting and the
design may very well be useful for applications that are sensitive to brief
disruptions in service. The unresolved problems are: scalability (no more
than 50 nodes), NATs, interactions between many RONs, and misuse or
violation of AUPs and BGP transit policies.
Received on Mon Oct 16 2006 - 00:22:09 EDT

This archive was generated by hypermail 2.2.0 : Mon Oct 16 2006 - 12:18:44 EDT