BASE: using abstraction to improve fault tolerance
Rodrigo Rodrigues, Miguel Castro, Barbara Liskov
Abstract
Software errors are a major cause of outages and they are
increasingly exploited in malicious attacks. Byzantine fault
tolerance allows replicated systems to mask some software errors
but it is expensive to deploy. This paper describes a replication
technique, BASE, which uses abstraction to reduce the cost of
Byzantine fault tolerance and to improve its ability to mask
software errors. BASE reduces cost because it enables reuse of
off-the-shelf service implementations. It improves availability
because each replica can be repaired periodically using an abstract
view of the state stored by correct replicas, and because each
replica can run distinct or non-deterministic service
implementations, which reduces the probability of common mode
failures. We built an NFS service where each replica can run a
different off-the-shelf file system implementation, and an
object-oriented database where the replicas ran the same,
non-deterministic implementation. These examples suggest that our
technique can be used in practice---in both cases, the
implementation required only a modest amount of new code, and our
performance results indicate that the replicated services perform
comparably to the implementations that they reuse.