This page is about deterministic computer algorithms that can be used to generate (pseudo-) random numbers (sometimes call PRNGs). These are sequences that share certain statistical properties with sequences of iid (independent identical distributed) samples of a uniformly distributed random variable. (Variates from other distributions are almost always obtained by processing series of uniform psuedo-random variates; see references. The only exception is when “random” numbers are obtained from some physical process which is believed to follow a different distribution.)
These algorithms are used in many different contexts, Azimuth will concentrate on simulations of processes in Earth sciences, simulations in the context of Time series analysis.
Random numbers play a central role in computer simulations that incorporate some sort of random influence. Most simulations use a deterministic algorithm to generate sequences of numbers as a substitute for true random numbers.
Of course the output of a deterministic algorithm cannot be random in the sense that it shares all mathematical properties with a sample of iid random variables, for this reason numbers generated this way are also sometimes called “pseudo random numbers”. One deficiency of deterministic algorithms is that the generated numbers will exhibit some sort of long range correlation, that is they fail to be independent, or to become more independent with time as is the case with the theoretical construct of a Mixing random process?.
While long range correlation is an issue within a single “run”, one often runs a stocchastic simulation multiple times to rule out a conclusions being dependent on the exact random numbers. In these cases if the sequences of random numbers used by two of the runs overlaps again one loses confidence in the conclusions. Contemporary computer code generally uses data types with a few, fixed storage capacity, the integer data type “long” of Java, for example, has a fixed length of 64 bits. This implies that any deterministic state machine with a state encoded by a single long has a period of at most $2^64$. This includes random number generators, so one should choose a PRNG with a period long enough that the chance of the sequences (with randomly chosen initial seeds) used by multiple runs overlapping is negligable.
The initial state of a random number generator is specified by an input parameter commonly called seed. Since seeds are also data types of finite length, the number of “independent runs” of a generator is limited by the range of the seed. In particular, one should be very careful when seeding a PRNG from a low entopy source (eg, some function of the current time).
Finally, be aware that for some structures, particularly ones based on combinatorics, the number of potential elements rises so rapidly that being the random generation process being able to construct all of them may be beyond the capacity (seeding and period) of a PRNG. (For example, being able to sample uniformly from the set of permutations on “just” 500 items.)
From a practical point of view a random number generator is good if it is random enough for the application that consumes the numbers it produces, in the sense that the application does not produce any artifacts that depend on the generator used. Since no random number generator is truly random, there is no objective mathematical criterion to classify generators; in practice, users have specified a list of statistical tests that a “good” generator should pass - on the average - while “bad” generators do not - on the average (see the references for details).
From a pure mathematical point of view this means that we know some ways in which the “bad” generators fail, while we did not figure out how to unmask the “good” ones. So the bad ones are actually those generators that we happen to know more about!
Since this is an example where there is no satisfying solution from a pure mathematical point of view, practitioners have to fall back on some rules of thumb, here are some:
Always be suspicious of built-in generators of operating systems or programming languages. However, it is likely that a language’s standard library implementation of a PRNG is likely to be better than a user programmed one unless a large amount of time is spent on it.
Don’t use generators that are known to fail certain statistical tests.
Try to estimate if the methods used by a generator could produce patterns that effect the application using the generated numbers (tough! almost always hand-waving in practice).
Use several different generators, based on different methods, to find out if your results depend on the generator.
The primordial goal of a generator is to produce bits that appear to be random. Since the simplest encoding of an array of bits in most programming languages is an integer data type, the first step of most generators is to produce such a data type, int or long in Java.
For probability distributions that are continuous with respect to the Lesbegue measure, the second step consists in mapping the generated integers to a floating point data type contained in the interval $[0, 1]$, distributed according to the probability density $\chi_{[0, 1]}$, the characteristic function of the interval $[0, 1]$.
Most common probability distributions that have a density with respect to the Lesbegue measure can be computed efficiently with the input of step number two.
The following is a standalone Java implementation of the algorithm “ran” from “Numerical Recipes” (see references).
/**
* Based on Ran from ran.h of numerical recipes
* This class is stateful and not threadsafe!
*/
public class Ran {
private long u;
private long v;
private long w;
/**
* Constructor.
* @param pSeed the seed of this instance
*/
public Ran(final long pSeed)
{
v = 4101842887655102017L;
w = 1;
u = pSeed ^ v;
generateRandomLong();
v = u;
generateRandomLong();
w = v;
generateRandomLong();
}
/**
* generates a "random" long, this method will change the state
* of the instance as a side effect!
* @return long the generated "random" long
*/
public long generateRandomLong()
{
u = u * 2862933555777941757L + 7046029254386353087L;
v ^= v >> 17;
v ^= v << 31;
v ^= v >> 8;
w = 4294957665L*(w & 0xffffffff) + (w >> 32);
long x = u ^ (u << 21);
x ^= x >> 35;
x ^= x << 4;
return (x + v) ^ w;
}
/**
* returns uniform distributed double numbers between 0 and 1
* @return double
*/
public double generateUniformDouble()
{
// since in Java there are no unsigned integer simple data types,
// int and long are always signed, we need an additional offset of 0.5 to
// get a result in [0, 1] instead of [-0.5, 0.5], compared to the original C++ implementation
return 5.42101086242752217E-20 * generateRandomLong() + 0.5;
}
}
(Note that this has an internal state of – naively – 192 bits, i.e., 3 64-bit words.)
For generating variates from other distributions, see Azimuth page on Transforming uniform random variables to other distributions.
Wikipedia: Random number generation
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes, The Art of Scientific Computing (3rd edition, chapter 7, webpage, ZMATH)
James E. Gentle: Random Number Generation and Monte Carlo Methods (Springer, 2nd edition, ZMATH)
Tests for random number generators: CICS
We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random.
anonymous computer center’s programming consultant on misusing the random number generator