Stability, Conditioning and Machine Epsilon

Stability is a property of the algorithm alone. An algorithm is said to be stable if it introduces little error of its own (i.e. through truncation, roundoff, cancellation, etc.). In particular, the (approximate) solution one obtains via a stable algorithm is the exact solution to a "nearby" problem in the following sense: |x - x_hat| is "small", where x_hat is such that f(x_hat) = f_approx(x) (here f is the function we're trying to compute and f_approx is some algorithm that computes it, possibly inexactly). This is equivalent to saying that an algorithm is stable when the backward error is "small", which gives us a way to quantify stability. But how do we compute the value of x_hat? Given some algorithm f_approx (think of it as a "black box" whose internal workings are unimportant), we get y_hat = f_approx(x) as our approximation to y = f(x), the true value of f for input x. We can then think of y_hat, which is really the approximate solution for input x, as an exact solution for some perturbed input x_hat. In other words, we set f(x_hat) = y_hat, which implicitly determines x_hat. Assuming we're able to invert f, this yields x_hat = f^{-1}(y_hat), which we can actually calculate.

For example, suppose f(x) = cos(x). The Taylor series expansion of f (about 0) is given by f(x) = 1 - x^2/2 + x^4/4! - ... . We can truncate the series after the first three terms to obtain f_approx(x) = 1 - x^2/2 + x^4/4!. Now suppose that x = 1. Then y_hat = f_approx(x) = 1 - 1/2 + 1/24 = 0.54166..., so x_hat satisfies the equation cos(x_hat) = y_hat ~= 0.5417 (where '~=' stands for "approximately equal"). Solving for x_hat, we get x_hat = cos^{-1}(0.5417) = arccos(0.5417) ~= 0.998. Since the relative backward error |x - x_hat|/x = |1 - 0.998|/1 = 0.002 or 0.2% is "small", we can reasonably conclude that f_approx is a stable algorithm.

What about conditioning? The crucial point is that conditioning is a property of the problem alone. A problem is said to be well-conditioned (or well-posed) if |x1 - x2| "small" => |f(x1) - f(x2)| "small", i.e. minor perturbations of the input result in minor perturbations of the output. Intuitively, this means that f is a "nice" function. Conditioning is quantified using the notion of a condition number, defined as c = cond(f,x) = |(f(x) - f(x_hat))/f(x)| / |(x - x_hat)/x|, where x_hat is a point "near" x (i.e. |x - x_hat| is "small"). If c >> 1 (where '>>' stands for "much greater than"), the problem is said to be ill-conditioned. Note that cond is a function of x, i.e. conditioning depends on the particular input one has in mind -- a problem can be well-conditioned for most inputs yet extremely ill-conditioned for certain "bad inputs".

As the book observes, combining a well-conditioned problem with a stable algorithm always gives good results, since the algorithm will return the exact solution to a nearby problem (by stability), which is a reasonable approximation to the exact solution to the original problem (by conditioning). If we have an unstable algorithm for a well-conditioned problem, it might be (and usually is) possible to come up with a different, stable, algorithm. However, if the problem is ill-conditioned, we're out of luck -- even if our algorithm is stable, the exact solution to a nearby problem is probably a terrible approximation to the original problem.

Finally, a few words about machine epsilon (a.k.a. machine precision or unit roundoff). This small positive number is an upper bound on the relative roundoff error (which is a consequence of discretizing the real line), i.e. |(fl(x) - x)/x| <= machine_epsilon (here 'fl(x)' stands for "the floating point representation of x"). There are several different definitions of this quantity, and unfortunately they aren't exactly equivalent. Here's a nice explicit formulation which holds if rounding to nearest is used: machine_epsilon = 0.5*beta^(1-p) (here beta is the base, e.g. 2, and p is the number of digits in the mantissa). Note that, although both are small numbers, machine epsilon shouldn't be confused with the underflow level UFL = beta^L (here 'L' is the smallest exponent value), which is the smallest positive machine number (I think I may have said UFL is negative, which is clearly wrong). In particular, machine epsilon depends on p but not L, whereas UFL depends on L but not p.