Review: RON

From: Di Niu <dniu_at_eecg.toronto.edu>
Date: Mon, 16 Oct 2006 23:05:01 -0400

Review: RON

Reviewer: Di Niu

To combat "delayed convergence" of BGP's fault recovery mechanisms
and the long latency in path outage repairs, this paper proposes
Resilient Overlay Networks (RON). RON is interesting and original, as
it is an application-layer overlay on top of the existing Internet
routing substrate. The RON nodes monitor the functioning and quality
of the Internet paths among themselves, and use this information to
decide whether to route packets directly over the Internet or by way
of other RON nodes, optimizing application-specific routing metrics.
RON is effective, in that it can detect and recover from path outages
and periods of degraded performance within several seconds.

The design of RON aims to meet three goals. The main goal of RON is
to enable failure detection and recovery in less than 20 seconds.
Second, it will integrate routing and path selection with the
application. Third, it will provide a framework for the
implementation of expressive routing policies, which govern the
choice of paths in the network.

RON adopts a kind of tagging like the IPv6 flow ID. It maintains
information about multiple alternate routes and to select the path
that best suits the client according to latency, or packet loss, or
throughput depending on the specific application. In the path
evaluation and selection, RON implements outage detection and three
different routing metrics: the latency-minimizer, the loss-minimizer,
and the TCP throughput-optimizer. The latency-minimizer forwarding
table is computed by computing an exponential weighted moving average
of round-trip latency samples with parameter alpha. Loss rates are
estimated using the average of the last 100 probe samples as the
current average.

RON strives to avoid paths of low throughput when good alternatives
are available. Throughput optimization combines the latency and loss
metrics using a simplified version of the TCP throughput equation,
which provides an upper-bound on TCP throughput. To prevent
oscillations from single packet losses, RON uses the formula: score =
(1.5^0.5)/(RTT*p^0.5). To reduce computational complexity, RON only
considers single-intermediate paths to obtain throughput-optimized
paths. RON separates policy routing into two components:
classification and routing table formation. And its packet header is
inspired by the design of IPv6.

Two sets of measurements of a working RON deployed across the
Internet proved the benefits of RON. It was found that RON can route
around between 60% and 100% of all significant outages. It takes on
average 18 seconds to detect and route around a path failure, which
is very fast. It was also found that RON can successfully recover
from performance failures. In one set of measurement, the loss
probability improved by at least 0.05 in 5% of the samples, end-to-
end communication latency reduced by 40 ms in 11% of the samples, and
TCP throughput doubled in 5% of all samples. And a good feature of
RON found through implementation is that in most cases, forwarding
packets via at most one intermediate RON node is sufficient both for
recovering from failures and for improving communication latency. The
paper is very well-written and represents very solid work.
Received on Mon Oct 16 2006 - 23:06:37 EDT

This archive was generated by hypermail 2.2.0 : Mon Oct 16 2006 - 23:39:02 EDT