================================= WEEK 4 ==================================== ============================================================================= Numerical Methods ============================================================================= Errors ------ * Numerical computations are prone to "errors" (a discrepancy between the exact solution and the computed solution). We discuss the source of these errors, their type, and how to compute them. Significant Figures ------------------- * Whenever we employ a number that comes from measurement, there will be some uncertainty associated with the number (e.g., when reading your weight off of an ordinary bathroom scale, the answer you get is precise only to the nearest 1/10th or 1/5th of a pound / kilogram, depending on the gradation of the scale). * This will not be a serious concern for the examples of numerical computations that we will see, but it must be kept in mind whenever a computation depends on numbers that come from measurements. Accuracy vs. Precision ---------------------- * Suppose that we repeat some numerical computation many times and record a list of computed values. The errors between these computed values and the exact value can be characterized in two ways: - The "accuracy" of the computed values is a measure of the distance between the computed values and the exact value. - The "precision" of the computed values is a measure of the distance between the different computed values only. Error Definitions ----------------- * Significant digits - Significant digits refers to the number of digits which can be used with confidence. - They have an impact on a computation. * There are two major kinds of errors possible with numerical computations: "round-off" errors result from the representation of exact numbers using only a limited number of significant digits (e.g., the value of the number "pi" might be represented as 3.14159265); "truncation" errors result from using an approximate computation instead of an exact formula (e.g., the value of e^x (e raised to the power x) might be computed using only the first few terms of the infinite series 1 + x + x^2/2! + ...). * In both cases, we can define the absolute error ("true error") as follows: Let x be the true value, and x* the approximate value E_t = |x - x*| A problem with using the true error is that it must always be considered in relation with the value. Many times absolute error is not a good indication of accuracy. For example, an error of 10cm in the measure of the distance from the Earth to the Moon is not at all the same as an error of 10cm in the measure of the height of the door! * The absolute relative error (for short "relative error") is defined as follows: e_t = |x - x*| / |x|, provided x is not 0 = E_t/|x|, provided x is not 0 (the relative error is often expressed as a percentage). It represents the error on a value as a fraction of that value, which is more useful. * Often, we will perform numerical computations where we do not know the exact value, and where we are generating a sequence of approximations. In those cases, we can define the "approximate error" as follows: E_a = difference between two successive approximations As before, we can scale this to define the relative approximate error: e_a = approximate error / current approximate value Number Representations ---------------------- * In a computer's memory, numbers must be represented using sequences of bits, each one of which can only hold the value 0 or 1. * Representing Integers: "binary" notation is used. * We also need real numbers for many types of computations such as image processing, weather forecasting, modelling, etc. However, their representation presents some challenges: - computers are discrete not continuous, while real numbers are continuous; - since computer are discrete, we cannot represent each number which is on the real number line. We have to do LOTS of rounding up or down to nearest representable floating point number to a given real number x; call this number fl(x). Floating-Point Number Representation ------------------------------------ * To represent fractional numbers, computers use a "floating-point" representation, similar to scientific notation. For example, the number 175.4325 would be represented as 0.1754325 x 10^3 (which we will write as 0.1754325e+3); it could also be represented as 175.4325 x 10^1 (which we will write as 0.1754325e+1); etc. * Generally, every number x is represented as a "mantissa" m (the fractional part of the number, 0 <= m < 1 ) and an "exponent" e (we also need a "base" b , which is usually 2 for computers) so that x = m*b^e. Other used bases are 8 (octal numbers), 10 (decimal numbers), and 16 (hexadecimal numbers). * In the computer, one bit is used for the sign of the number, some fixed number of bits for the exponent (including the sign of the exponent), and the rest of the bits for the mantissa; e.g., assuming a 32-bits computer, the IEEE single-precision standard is as follows: - 1 bit used for sign bit - 8 bits used for exponent - 23 bits used for mantissa - we don't need a sign bit for exponent so we save space - also first digit of mantissa is always 1 so we don't have to store it * One easy improvement can be made to the above basic notation, which can be motivated by the following example: if we wanted to represent the number 1/29 = 0.0344827... using a base-10, 4-digit mantissa and 1-digit exponent floating-point representation, we would get the approximation 0.0344e+0 (or 0.0345e+0 if we round the result). The result contains only 3 significant figures. Notice that by simply "shifting" the exponent, we can represent the same number with 4 significant figures instead, by representing it as 0.3448e-1. This is called "normalization", and is used in the IEEE standard: the idea is that the first digit of the mantissa is *never* 0 (this is achieved by shifting the exponent when necessary). * Notice that very large numbers cannot be represented (which is no different than for integers, and is called "overflow"), and very small numbers cannot be represented ("underflow"). One last "trick" can be used in that case: when the value of the exponent is the smallest possible, the requirement that the mantissa be normalized is dropped. This helps to extend the range of numbers that can be represented near 0, but there is still a limit that cannot be exceeded. * Round-off errors are introduced when representing "exact" numbers. For example, suppose that we wanted to represent the value of "e" (the base of the natural logarithms) using a base-10, 5-digit mantissa and 1-digit exponent floating-point number. The exact value of e is 2.718281828459045..., and it can be represented by one of two numbers: If we simply chop the extra digits, we get 0.27182e+1 , with a true error of E_t = 0.00008182... (a relative error of e_t = 0.00301...%). If we round the value, we get 0.27183e+1, with a true error of E_t = -0.00001817... (a relative error of e_t = -0.000668...%). Rounding is the technique that is usually employed (why???). * What's the maximum error that could be introduced by rounding? Essentially, it's half of the distance between two consecutive floating-point numbers (denoted 'delta' x). There's just one problem: the distance between consecutive floating-point numbers is not constant (it depends on the exponent)! For example, if we are using a base-10, 4-digit mantissa and 1-digit exponent floating-point representation, then the distance between consecutive numbers is equal to 0.0001e-1 = 0.00001 when the exponent is 1 (e.g., 0.3454e-1 = 0.03454 and 0.3455e-1 = 0.03455 are consecutive numbers 0.00001 apart), while the distance between consecutive numbers becomes 0.0001e+3 = 0.1 when the exponent is 3. Luckily, the *relative* round-off error remains the same: it is equal to |x - fl(x)| / |x|. Machine Precision ----------------- Let epsilon be the smallest positive float that is such that, when added to to one, it produces a result greater than one; i.e., 1 + epsilon > 1. This number "epsilon" is called the "machine epsilon". Another way to define it is as the smallest number which can be added to 1.0 to produce a different floating-point number.