(no subject)

From: Jin Jin <jinjin_at_eecg.toronto.edu>
Date: Sun, 15 Oct 2006 11:16:09 -0400

This paper shows that a Resilient Overlay Networks (RON) can greatly
improve the reliability of Internet packet delivery by detecting and
recovering from outages and path failures more quickly than current
inter-domain routing protocols.
Wide-area routing scalability comes at the cost of reduced fault-
tolerance of end-to-end communication between Internet hosts.
Moreover, BGP's fault recovery mechanisms sometimes take many minutes
before routes converge to a consistent form, and there are times when
path outages even lead to significant disruptions in communication
lasting tens of minutes or more. Authors want to find a remedy for
some of these problems. Although overlay networks are an old idea,
few overlay networks have been designed for efficient fault detection
and recovery. This paper proposes RON. Distributed applications layer
a "resilient overlay network" over the underlying Internet routing
substrate.
There are three design goals, i) failure detection and recovery in
less than 20 seconds; ii) tighter integration of routing and path
selection with the application; and iii) expressive policy routing.
RON nodes, deployed at various locations on the Internet, form an
application-layer overlay to cooperatively route packets for each
other. Each RON node monitors the quality of the Internet paths
between it and the other nodes, and uses this information to
intelligently select paths for packets. Each Internet path between
two nodes is called a virtual link. Every node participates in a
routing protocol to exchange information about a variety of quality
metrics, including latency, packet loss rate, and available
throughput. Each RON node obtains the path metrics using a
combination of active probing experiments and passive observations of
on-going data transfers. RON is to provide a framework for the
implementation of expressive routing policies, which govern the
choice of paths in the networks. It allows users or administrators to
define the types of traffic allowed on particular network link, and
separates policy routing into two components: classification and
routing table formation, and routes the data based on the type. Most
of RON's design supports routing through multiple intermediate modes,
but the results show that using at most one intermediate RON node is
sufficient most of the time.
This paper also describes the design and implementation of RON, and
presents several experiments to evaluate it. RON is able to
successfully detect and recover from 100% (in RON1) and 60% (in RON2)
of all complete outages and all periods of sustained high loss rates
of 30% or more. The implementation takes 18 seconds, on average, to
detect and recover from a fault, significantly better than the
several minutes taken by BGP-4. RONs also overcome performance
failures, substantially improving the loss rate, latency, and TCP
throughput of badly performing Internet paths. These results suggest
that RON is a good platform on which a variety of resilient
distributed Internet applications may be developed.
This paper is well written and expressed clearly and in detail. It is
very solid with analysis and evaluation. The main contribution is
that authors use the overlay networks to solve the path failure
detection and recovery problem. It's research work with originality.
Based on the analysis, it seems that this approach could solve the
path failure problems perfectly. However, there still are some
problems and weakness in this paper.
Authors do not analyze the cost and overhead of this overlay networks
very clearly. Although it seems solve the big problems, the whole
scheme is so complicated and it costs lots to deploy. We should build
the whole overlay networks on the application layer. Based on the
"Implementation" section in the paper, it needs much work on it.
Moreover, it includes a whole independent protocol which add the
complexity, and even the RON packet header could add the overhead of
Internet. So, is it possible to deploy this network on the whole
Internet? Does this approach make sense?
Another weakness of this paper I think is that authors do not
analyze the scheme theoretically, which could convince the readers
better. Why this overlay networks could improve the path fail
detection and recovery? Could this so "huge" system work well? All
these are concluded by the analysis of some experiments or
simulation? Obviously, it lacks of theoretically analysis.
Received on Sun Oct 15 2006 - 11:16:28 EDT

This archive was generated by hypermail 2.2.0 : Mon Oct 16 2006 - 00:22:11 EDT