Generating random numbers ------------------------- - A "random variable" is simply a value associated with some random event. For example, if we are throwing a die, we could define a random variable to be equal to the value showing on the die (1-6), and if we were throwing two dice, we could define a random variable to be equal to the sum of the values (2-12). - Given a "discrete" random variable X (that can only have one of finitely many possible values), there is a fixed finite probability that the variable will have a particular value x, for each value x. Obviously, the probabilities do *not* have to be the same for different values of the random variable (e.g., we don't expect that the sum of two dice will be 2 as often as it will be 6). This gives rise to the idea of a "probability distribution": given a random variable X, a probability distribution is simply a function f_X that assigns a probability f_X(x) to each possible value x of the variable, respecting the condition that the sum of the probabilities over all values is equal to 1 (i.e., sum_{possible x} f_X(x) = 1). - Some common distributions: uniform (e.g., when throwing one die), normal (e.g., when throwing 2 dice, or more), etc. - Given a "continuous" random variable X (that can have one of infinitely many possible values), the probability that it is exactly equal to one single value x is always 0, so we need to talk of the probability of the value lying in some interval. We define a "probability density function" f_X, representing the probability that the value of X lies within a certain interval [a,b], as follows: /b Pr[ a <= X <= b ] = | f_X(x) dx. /a By analogy with the discrete case, we want f_X to "sum up to 1" over all possible values of the random variable X, which we can express as follows: /oo | f_X(x) dx = 1. /-oo - Again, there are various common distributions. The "uniform" distribution over an interval [a,b] is given by: { 1/(b-a) if a <= x <= b, f_X(x) = { { 0 otherwise. It represents a distribution where X has equal probability of having any single value in [a,b]. - The "exponential" distribution is given by { r e^{-rt} if t >= 0, f_X(t) = { { 0 otherwise, for some real number r > 0, and represents the times between consecutive events when they occur at a uniform rate r over time, i.e., r is the average number of events that occur per unit time. - Now, we want to be able to generate random numbers with these distributions, and in order to do this, we must talk about one more concept: the "cumulative distribution function" F_X(x), represents the probability that the value of X is *less than or equal to* x, i.e., /x F_X(x) = | f_X(z) dz. /-oo For example, for the exponential distribution, f_X(t) = r e^{-rt} for t >= 0 so /x |x F_X(x) = | r e^{-rt} dt = r (e^{-rt}/-r) | = 1 - e^{-rx}. /0 |0 - How does this help us generate random values with an exponential distribution? Well, first of all we need to have at our disposal a source of random numbers with a uniform distribution in the *half-open* interval [0,1). Then, notice that the value of F_X(x) is always between 0 and 1 (since it expresses a probability) and F_X is a nondecreasing function. So in order to generate a sequence of values x_1,x_2,... for the random variable X having cumulative distribution function F_X, we can simply generate a sequence of values y_1,y_2,... having a continuous [0,1) uniform distribution and let x_k = F^{-1}_X(y_k), i.e., x_k is the number s.t. y_k = F_X(x_k). - For example, the cumulative distribution function for a continuous uniform random variable in [a,b] is F_X(x) = (x-a)/(b-a), so we can generate values from this distribution from uniform y in [0,1) by letting x = a + y*(b-a) (which is just what you'd expect). - To get random values x with a continuous exponential distribution, we can invert F_X(x) = 1 - e^{-rx} to get F^{-1}_X(y) = (-1/r) ln(1-y). Note that this is OK since y is uniformly distributed in the half-open interval [0,1), so that 1-y is distributed in (0,1]. Also, since r is the average number of events per unit time, (1/r) is the average (or "mean") time between events. Using `random()' ---------------- * The function call `random()' returns a pseudo-random long integer in the range 0 .. RANDOM_MAX. Why "pseudo-random"? Well, computers behave in a totally predictable manner, so they don't really have any natural randomness to use. They generate these pseudo-random numbers according to a well-defined deterministic computation. What this means is that `random' has access to a certain amount of data that it uses to determine the next number to return. And every time that this data has a particular value, random will return the same number. * `srandom(unsigned)' is used to initialize the state of the data that `random' uses to produce these pseudo-random values. This function should be called ONLY ONCE in the entire program, at the very beginning of the program. Whenever srandom is called with a particular "seed" as argument, `random' will generate a particular fixed sequence of numbers. This means that if srandom is called with a fixed value, say `srandom(25)', then the program will behave in exactly the same way each time that it is run. To get a program to have more "random" behaviour, we need to use a seed that's a little bit different every time we run the program. A standard trick is to use the value of the "system clock" for this: `srandom(time(NULL))' will do the trick (the `time(...)' function is declared in the header ). * All of this has *already* been taken care of for you in file random.c, which you can use by #including "random.h" in every file that needs to generate random numbers. Have a closer look at those files to see the functions they define.