next up previous
Next: Floating arithmetic Up: September 27 Previous: September 27


Roundoff

Even if your computer had no other source of error in calculating $ e^x$, it would need to fit it into some finite memory location. In what follows we'll assume (to make our calculations easier) that our computer represents floating point numbers using scientific notation and 5 decimal (base 10 digits). So, for example, $ 11\frac{1}{3}$ would be represented as:

$\displaystyle 1.1333 \times 10^1.
$

This is no different in principle from how floats are usually represented. If we used 5 binary digits instead, we'd represent $ 11\frac{1}{3}$ as:

$\displaystyle 1.0110 \times 2^3
$

This is basically the IEEE representation, except the latter uses 24 binary digits. In any case, roundoff error afflicts base 2, base 10, and any other conceivable finite representation. Here's how it works.

The decimal (aka base 10) expansion of $ e^1$ is 2.718281828459045...If we insist on squashing it into 5 digits, we have two reasonable choices: 2.1782 (truncation) or 2.1783 (rounding). Truncation is computationally easier, since for rounding we must examine the first omitted digit and round up if it is 5 or greater, down otherwise. How bad are our 5-digit approximations? A reasonable measure would be to find the difference between our approximation and the true value, and see what proportion of the true value the discrepancy is. This is called the relative error:

   relative error$\displaystyle = \frac{\text{approximate value} - \text{true
value}}{\text{true value}}.
$

In our 5-digit $ e^1$, the truncated version gives us a relative error of about 0.003%, whereas the rounded version gives a better relative error of -0.00067%. Rounding seems to be more reliable, since it's too high about half the time, too low about half, whereas truncation is always too low. So, we'll stick with rounding (as most floating point arithmetic units do).

So, what's the worst relative error we can expect from rounding? Well, if the first digit that we omit is a 5 followed by zeros, we round up and our total error is half the distance between two consecutive numbers. For example, if the true value were 5.43215, to keep only 5 digits we'd round up to 5.4322, and our total error would be 0.00005. Notice that the total error changes if we approximate 5.43215 $ \times$ $ 10^3$ by 5.4322 $ \times$ $ 10^3$ (the total error gets a thousand times bigger). What we really want is the relative error.

Re-write 5 as 10/2, since what's important is that 5 is half of the base (10). Also, since we're trying to find the maximum relative error, make the denominator as small as possible by having mantissa 1.0000. Then, no matter what exponent $ k$ you raise 10 to, the relative error is:

$\displaystyle \frac{10/2 \times 10^{-5} \times 10^k \times 10/2}{1.0000 \times
10^k }
$

The $ 10^k$ cancels out, and you get $ \frac{10^{1-5}}{2}$. The same reasoning works for bases other than 10 (just replace 10 by the symbol $ b$ for the base in the above expression. The reasoning also works if you have a different number of digits than five (just replace 5 by $ t$ in the above expression. Then the relative error due to roundoff in a base $ b$ floating point representation with $ t$ digits is:

$\displaystyle \frac{b^{1-t}}{2}
$

The number $ b^{1-t}$ is called the ``machine epsilon.'' It is the smallest number that can be added to 1.0 to produce a different floating point number.


next up previous
Next: Floating arithmetic Up: September 27 Previous: September 27
Danny Heap 2002-12-16