Tolerating Transient and Permanent Failures

Efthymios Anagnostou and Vassos Hadzilacos

Self-stabilizing protocols are resilient to transient failures affecting potentially all processors of a distributed system. In contrast, fault-tolerant protocols are resilient to permanent failures affecting some fraction of processors. We investigate the possibility of designing protocols that are both self-stabilizing and fault-tolerant, in the asynchronous model of distributed systems. We prove that no such protocols exist for a wide range of problems, and that this is so even if we allow randomization. The problems that our impossibility results cover include, among others: Determining (even approximately) the size of the distributed system; leader election; mutual exclusion; computing a spanning tree; and computing a maximal independent set of nodes. All of these problems are solvable in asynchronous systems using (randomised) protocols that are only fault-tolerant (but not self-stabilizing); or only self-stabilizing (but not fault-tolerant). We then focus on the problem of computing distinct names for the processors in a ring. We give three (randomised) protocols that solve this problem in three settings satisfying increasingly weak assumptions: When processors know the exact size $n$ of the ring; when they only know an upper bound on $n$; and when they have no information about $n$.