Summary: Resilient Overlay Networks

From: Andrew Miklas <agmiklas_at_cs.toronto.edu>
Date: Mon, 16 Oct 2006 23:39:34 -0400

This paper describes Resilient Overlay Networks (RON), which provide a
way to do complex routing between hosts residing in different
autonomous systems. This is interesting because the usual way to do
routing between hosts in different ASes is to use BGP-advertised
routes. Since BGP does not expose the internal layout of each AS,
routes can't be optimized to make use of intra-AS route information.
This relates back to BGP's primary design goal: scalability, even at
the expense of derived route quality.

RONs allow for the opposite tradeoff to be made. Conceptually, the
transport layer submits packets into the RON network layer. The
packets are then routed over a virtual data-link layer, where each
link corresponds to a path between two hosts on the "real" Internet.
Therefore, RONs not only allow the creation of routing algorithms with
design goals quite contrary to BGP, but they implement them on top of
networks actually running BGP.

The main design goals of the described system are:
- Failure detection and recovery in less than 20 seconds
- Tight integration of routing and path selection with the application
- Expressive policy routing

The authors accomplish these goals by running a piece of software on
the host that sits between the transport and network layers. The
inserted component forms a virtual network layer that applications can
link against in order to route using the RON. The RON routing
software is a client of the virtual network layer. One of the key
activities of this application is to ping every other node in the
network every 12 seconds. By doing so, the routing software can
quickly detect the failure of a virtual link (ie. a path) between two
clients, and adjust its routing tables accordingly. Link failures are
detected much faster by RON than BGP.

The routing application also exchanges information with each of the
other clients in the RON regarding the current performance metrics of
each link. Measured properties include: latency, packet loss rate,
and expected TCP throughput. Applications running on top of a RON are
able to specify their routing preferences in terms of these
properties. Thus, an application that indicates that it most highly
values low-latencies might be routed using one combination of links,
while an application that indicates that raw throughput is most
important might be routed using another path through the RON.

Finally, the system allows routes between networks to be constructed
at a very fine-grained level of permission. For example, imagine that
the UToronto<->Internet link is broken; however, the
UToronto<->UWaterloo link stays active. Users at UT who have accounts
at UW can access the Internet, but only indirectly (ie. ssh'ing to a
Waterloo machine, and then out onto the Internet). Ideally, machines
at UT that are owned by people who also have UW accounts would
automatically have their routing tables updated so that all
Internet-bound traffic is funnelled through UW. However, this can't
be accomplished with BGP, as it has no concept of a user. RONs,
however, make these sorts of routing arrangements possible.

The paper presents a pretty thorough evaluation section. The authors
created a RON over 12-16 nodes, with each located in a different AS.
Some of the host AS's did not have direct routes to each other, so a
total of 36-74 AS's were involved. Using this RON, they created (at
most) single-hop routes optimized for packet loss, latency, and TCP
throughput. They compared each RON route's respective property
against the direct, BGP provided route. In many cases, their routes
performed better than the default routes for the metric in question.
However, although it wasn't discussed much in the text, some of their
graphs indicate that RONs occasionally did worse on the optimized
parameter. For example, Figure 11 seems to show that ~5% of the time,
packet loss was worse along the RON compared to the ordinary Internet.

The main drawbacks seem to have been brought up by the authors. For
example, this system could be used to subvert the inter-ISP routing
policies that are currently enforced using BGP. Some people, however,
would regard this as a benefit of the system. RONs could be used by
consumers and providers to push back against any attempt by the telcom
carriers to violate the "network neutrality" principle of the
Internet.

Another drawback is the single-hop-only routing process used in the
paper. While the authors note that in most cases, only one additional
hop is required to gain the benefits of a RON, they also present a few
cases where connectivity is hampered because the system can't support
more than one hop. Fortunately, this seems to simply be an
implementation limitation, but it would have been nice to see some
tests in configurations with longer paths. My concern here is that
the routing algorithms that they used would not be able to work well
in multi-hop networks.

In some sense, RONs are to networks what virtual machines are to
computers. The network stack of a system using a RON is:

- Transport
  - Virtual Network
  - Virtual Data Link
- Network
- Data Link
- Physical

Thus, RONs share the same sorts of benefits and drawbacks as VMs.
Benefits: ability to reinvent the system at a much deeper level than
would be practically possible, etc.
Drawbacks: wasted computational effort "redoing" certain processes at
higher levels, etc.
Received on Mon Oct 16 2006 - 23:39:00 EDT

This archive was generated by hypermail 2.2.0 : Mon Oct 16 2006 - 23:59:24 EDT