Review: RON

From: Waqas ur Rehman <waqas_at_cs.toronto.edu>
Date: Tue, 17 Oct 2006 00:00:07 -0400

Internet has been organized into Autonomous Systems (AS) to cater for
the increasing topological complexities and hierarchal management. BGP-4
is being used to exchange the inter-domain routing information between
these Autonomous systems. The goal of BGP-4 is to allow the
administrator of a AS system to have arbitrarily complex policies with
in their domain independent of others and to hide the detailed routing
information of a single AS and share with other AS's only the filtered
information. But this comes at a cost, and that is delayed convergence.
Recent studies have shown that BGP-4 could take an order of magnitude of
minutes to recover from link failure that could cause some of the
application to fail. In order to overcome such delays the author has
purposed an Resilient Overlay Network (RON) that allows application to
detect such failure and recover within minutes.

RON is an application layer overlay network that works on existing
internet routing stack. To form a RON architecture number of nodes are
deployed in different routing domains. Each RON node has links to every
other node and they continuously monitor the links between them and
exchange the routing information. Applications forward their packets to
RON which depending on the routing metrics can either forward the packet
to destination through intermediate RON nodes or just using the normal
internet path. The goal is to detect the link failures and forward the
data on alternative paths that is not possible in current AS's
infrastructure using BGP-4. RON also provides the applications with the
flexibility to express routing policies and thus integrating routing and
path selection with applications.

In order to evaluate the efficiency and performance of RON, author used
the real world deployment of the system. Two different data sets were
gathered one using 12 nodes and other using 16 RON nodes. These nodes
were deployed at several internet sites in America. Probe packets,
throughput sample and traceroute results constituted the measurement
data. Using this data author has claimed that Ron was able to
successfully detect and recover from 60%-100% of all outages and it took
18 seconds on average for RON to route around the failures.

Though the results seem impressive but for me they are not as impressive
as they look. One of the reason is that RON has not been deployed and
tested over a very large scale. The testing was done considering only12
sites which I believe is not representative of whole internet
population. This architecture is an implementation of another network
over internet and that could only have value if it really works for
entire internet which I feel seems somewhat difficult as the author has
not discussed the behavior and fault tolerance nature of RON in case of
adverse condition e.g. Congestion, in which all the traffic is directed
to a single node which is quite possible in RON architecture. Also each
RON has a link to every other RON node and continuously exchange probe
packets with them which also seems impossible to implement when
considering the entire internet architecture. So though the solution
seems impressive but nothing can be said unless it is deployed and
tested on larger scale.
Received on Mon Oct 16 2006 - 23:59:21 EDT

This archive was generated by hypermail 2.2.0 : Tue Oct 17 2006 - 04:57:48 EDT