Review One-Hop Source Routing

From: Vladan D <vladandjeric_at_gmail.com>
Date: Mon, 16 Oct 2006 05:18:47 -0400

One-hop source routing is another attempt to quickly recover from path
failures by routing packets over intermediates, although in this case, the
intermediaries are randomly chosen from a list. Unlike the previous RON
paper, this paper does not monitor path status, allowing it to free itself
from RON's scalability limits and overhead when there are no failures.

The authors performed a study of availability and network failures to help
better define the problem being solved. They found that paths to popular
Web servers had 99.6% availability while paths to broadband hosts had
94.4%availability. Most paths had less than 15 minutes of downtime.
The
majority of paths experienced failures but many of the failures were close
to the destination and therefore it is impossible to route around them. It
was shown that 16% of failures of popular web servers and 6% of failures of
broadband computers were last-hop or host failures.

The SOSR system architecture is composed of source-nodes and
intermediary-nodes. The source-node retries failed communications by
choosing to pass packets through intermediaries acting as proxies. This
process is transparent to destinations and does not require a
destination-node. The source node uses netfilter to redirect packets to
user-level modules which are in change of encapsulation and tunnelling to
intermediaries. The intermediaries are implemented through user-level NAT
daemons.

According to the "random-4" algorithm, after a packet is lost, the packet
should be routed through both the regular path and 4 randomly chosen
intermediaries. After 4 attempts through each of the chosen intermediaries,
no further attempts are performed and the end systems can only wait for the
path to self-heal. SOSR experiments performed with PlanetLab show that
popular web servers recovered from 75.3% of errors after 4 tries with
15.1%of these recoveries coming from self-healing of the paths. For
broadband
hosts, the recover rate is 29.4%. SOSR derives its benefit in part because
of path diversity, and in part because of its more aggressive
retransmission. In real tests of the SOSR system, SOSR reduced recoverable
(network-level) failures by 56% – close to what was predicted by empirical
data. Clearly, SOSR works best when recovery from "core" failures.

This paper is well presented, with useful brief summaries at the end of
every section. I think the idea is interesting and the authors have
effectively created a slimmed down version of RON. By keeping less state
and requiring less involvement from intermediaries, they were able to
increase performance. A theoretical study would improve this paper.
Received on Mon Oct 16 2006 - 05:19:03 EDT

This archive was generated by hypermail 2.2.0 : Mon Oct 16 2006 - 18:37:43 EDT