Random number generator *

Often, when we use the term random number we actually mean arbitrary. If we ask for an arbitrary number, we are saying that we don't really care what number we get: almost any number will do. By contrast, a random number is a precisely defined mathematical concept: every number should be equally likely to occur. That definition, however, makes sense only for a sequence of numbers retrieved from some finite domain. We cannot have a random integer, we can have only a random integer in some range. Thus, random numbers is a sequence of random numbers from a given range (domain).

There is no way to produce true random numbers on a computer. Once a program is written, the numbers it will produce can be deduced (see example), so how could they be random? The best we can hope to do is to write programs which produce sequence of numbers having many of the same properties as random numbers. Such numbers are commonly called pseudo-random numbers: they are not really random, but they can be useful as an approximations to random numbers.

It's easy to see that approximate the property "each number is equally likely to occur" in a long sequence is not enough. For example, each number from the range [1, 100] appears exactly once in the sequence {1, 2, 3, ..., 99, 100}, but that sequence is unlikely to be useful as an approximation to a random sequence.In fact, in a random sequence of length 100 of numbers in the range [1, 100], it is likely that a few numbers will appear more than once and a few will not appear at all. If this doesn't happen in a sequence of pseudo-random numbers, then there is something wrong with the random number generator.

Linear congruential method

The best-known method for generating random numbers, which has been used almost exclusively since it was introduced by D. Lehmer in 1951, is so called linear congruential method. If seed contains some arbitrary number, then the following statement fills up an array with N random numbers using this method:
a[0] = seed;
for(i=1;i<N;i++)
   a[i] = (a[i-1]*b + c) % M;
All the generated numbers are integers between 0 and M-1.

The algorithm looks very simple, but this simplicity is misleading because the devil lies in the details of choosing the algorithm's parameters. Here is a partial list of recommendations based on the results of sophisticated mathematical analysis (see Knuth, volume 2 for details):

Knuth shows that these chices make the linear congruential method produce good random numbers which pass several sophisticated tests. The most serios potential problem, which can quickly become apparent, is that the generator can get caught in a cycle. For example, the choice produces the sequence 0, 1, 20, 0, 1, 20, 0, 1, 20 , ..., a non-very-random sequence of integers between 0 and 380. Unfortunately, not all such difficulties are easy to spot.

Random number generator - version 1

Suppose that our computer has a 32-bit word, and we choose: Then our first version of the class Random can look like
class Random {
   public:
      Random() { a = seed = 1234567; };
      int next();
   private:
      int seed;
      int a;
      enum { M=100000000, b=31415821, c=1 };
};

int Random::next()
{
   a = (a*b + c) % M;
   return a;
}
This simple implementation has several problems. The first of them is possible overflow. Although all our parameters are less than the largest integer that can be represented, but the first operation a*b+c can cause an overflow. However, since the part of the product that causes overflow is not relevant to our computation (remember, we are interested in only the last 8 digits), we can use the following trick that helps us to avoid overflow.

Assume that we need to compute the product of two integer numbers p and q modulo M. We will write p as

 p = 104*p1 + p0
and q as
 q = 104*q1 + q0
Then their product can be written as
p*q = (104*p1+p0) * (104*q1+q0)
    = 108*p1*q1 + 104*(p1*q0+p0*q1) + p0*q0
Now, since we only want the last 8 digits of the result, we can ignore the first term. Thus,
 (p*q) mod 108 = (104*(p1*q0+p0*q1)+p0*q0) mod 108
To prevent the result of computation 104*(p1*q0+p0*q1) from overflow, we can compute this
((p1*q0+p0*q1) mod 104 )104
instead.

Random number generator - version 2

This trick will result in the following modification of the previous code
class Random {
   public:
      Random() { seed = 1234567; };
      int next();
   private:
      int seed;
      int a;
      union { M1=10000, M=M1*M1, b=31415821, c=1 };
      int mult(int p, int q);
};

int Random::mult(int p, int q)
{
   int p1 = p/M1, p0 = p%M1,
       q1 = q/M1, q0 = q%M1;
   return (((p0*q1+p1*q0)%M1)*M1 + p0*q0) % M;
}

int Random::next()
{
   a = (mult(a, b) + c) % M;
   return a;
}

Now, if we run the following code to generate 20 random numbers,

Random r;
for(i=0;i<20;i++)
   cout<<r.next()<<endl;
we will receive
35884508, 80001069, 63512650, 43635651, 1034472, 87181513, 6917174, 209855, 67115956, 59939877,
46594018, 29158779, 81642560, 50941761, 45000782, 12172023, 95775884, 27860765, 6163066, 78267187
There is some obvious non-randomness in these numbers: for example, the last digits cycle through the digits 0 - 9. In general, the digits on the right are not random, a fact that is the source of a common and serious mistake in the use of random number generator of this type. Typically, in applications one wants to random numbers in the range [0, r), where r is application dependent and cannot conveniently be built in to the random number generator.

The following is a naive attempt that results in a bad program for generating random numbers is a small range:

int Random::next(int r)
{
   return next() % r;
}
For example, the following code, which is trying to retrieve 20 random numbers in the range [0, 9]
Random r;
for(i=0;i<20;i++)
   cout<<r.next(10)<<endl;
will result in
8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7
This problem is easily fixed by using the digits on the left instead. To retrieve the desired digits on the left we need to compute (a*r)/M, but, again, overflow must be circumvented,a s in the following implementation:
int Random::next(int r)
{
   return ((next()/M1) * r) / M1;
}
If we use this implementation of the next(10) method, then the same code will produce
3, 8, 6, 4, 0, 8, 0, 0, 6, 5, 4, 2, 8, 5, 4, 1, 9, 2, 0, 7

The final version of the implementation of our random generator can be found on the example page.

Testing pseudo-random number generator

Many sophisticated tests based on specific observations have been devised for random number generators, in order to test whether a long sequence of of pseudo-random numbers has some property that random numbers would. In this lecture, we will look in detail at one of the most important such tests, the χ2 (chi-square) test.

The idea of the χ2 test is to check whether or not the numbers produced are spread out reasonably. If we generate N positive numbers less than r, then we'd expect to get about N/r numbers of each value. But - and this is the essence of the matter - the frequencies of occurrence of all the values should not be exactly the same. It turns out that calculating whether or not a sequence of numbers is distributed as well as a random sequence is not difficult. The "χ2 statistic" may be expressed mathematically as

χ2 =       Σi=0 r-1 (fi-N/r)2
N/r
If the χ2 statistic is close to r, then the numbers are random; if it is too far away, then they are not. The notions of "close" and "far away" can be more precisely defined, but for the simple test that we are performing, the statistic should be within 2*sqrt(r) or r. That is,
r - 2*sqrt(r) < χ2 < r + 2*sqrt(r)


This material is from the book of Robert Sedgewick Algorithms in C++, ISBN:0-201-51059-6, Chapter 35.