When you type 2+2 into a calculator, it’s always going to spit out the number 4. The same is generally true for computer programs whether it’s in Microsoft Excel or the calculator on your iPhone. This is because computers are deterministic machines who are notoriously good at following explicit directions and are capable of performing thousands of operations each second. In short, you expect the same computer input to always give you the same output.
What happens when you want a computer to do something random like virtually roll a die 5 times? It’s easy enough to do this in real life, but how do you get truly random results from machine designed to not to leave output to chance?
The need for truly random numbers goes well beyond degenerate gamblers looking to get their fix of virtual rolling dice. Online poker sites are such an attractive gambling option because the ‘randomness’ behind what cards are virtually dealt are a pretty reasonable substitute for the hot hands and bad beats of real poker. More practically, encryption technologies behind e-commerce security depend on the same pseudorandom-ness.
Just like millenials at Coachella pretending they know obscure (sometimes even fake) bands, computers do a good enough job of convincing you that their random numbers are in fact random. The most widely used random number generator is called the Mersenne Twister (which is coincidentally my new fantasy football team name). The Mersenne Twister is so-called because the linchpin of the algorithm is the number 2^199937 – 1, referred to as the period of the algorithm. Mersenne Primes are prime numbers of the form 2^199937 – 1, named after the French Mathematician Marin Mersenne who studied them. The smallest of them is 2^2 – 1,=3, and the prime at the heart of the Mersenne Twister is one of the largest known of this form. Each iteration of the algorithm pops out a number between 0 and 2^199937 – 1 and divides it by 2^199937 – 1 so that the final output lies somewhere on the interval of 0 to 1. With such a large period, the algorithm could pop out pseudo-random numbers without repeating until the Sun burns out.
I tried following the algebra through several references but struggled to get through the bit-shifting parts of the algorithm. You can find the original paper here, but I found the NumPy implemention of random numbers to be a much more worthwhile lesson in theory and writing robust, extendable code. The algorithm has two parts; the first is the derivation of the next raw number from a linear recurrence, essentially finding out which number is next in line, followed by tempering which uses a series of bit shifts and masks to transform the raw bits into the final output. In the randomkit.c file, the rk_random() function is doing all of the heavy lifting.
Most of the other functions just wrap rk_random to perform other common random functions like choosing random integers or returning numbers from a Normal distribution.
Note the significance of the seed in generating random numbers. Throughout the post I’ve referred to ‘pseudo-random’ numbers because the seed dictates the order in which numbers are generated. If two people use the same random number generator with the same seed value set, they will generate the same numbers in the same order. Nothing about that last sentence implies randomness in the way that people usually think of it. The flaw of the Mersenne Twister and all random number generators are that if you observe enough of their numbers in succession you can infer the underlying seed of the generator and figure out the next pseudo-random number simply with pen and paper. This flaw is what makes the Mersenne Twister insecure for certain levels of cryptographic security.
What blows my mind is just how many areas of math and computer science this algorithm is connected to.
- The validity of Monte Carlo simulations and many other statistical inference techniques depend on the randomness of the Mersenne Twister.
- Mersenne primes and their foundations in number theory lie at the heart of this algorithm.
- Even the ability to ‘randomly’ generate samples from the range of infinite rational numbers between 0 and 1 highlights the level of precision that computers can carry.
I really hope this first post hasn’t scared anyone too much. It’s so easy to rely on our computers for everything from personal finances to blogging without knowing the first thing about disk memory or CPUs, but there’s a lot of elegant machinery going on behind the scenes that deserves some overdue appreciation. It’s been tough looking for a paper on Mersenne Twisters that I can understand and even harder putting together a post that isn’t too deep to lose everyone, but still has enough substance to appreciate this modern idea of pseudo-randomness. Check back in a week for some more knowledge.