The Mersenne Twister

When you type 2+2 into a calculator, it’s always going to spit out the number 4. The same is generally true for computer programs whether it’s in Microsoft Excel or the calculator on your iPhone. This is because computers are deterministic machines who are notoriously good at following explicit directions and are capable of performing thousands of operations each second. In short, you expect the same computer input to always give you the same output.

What happens when you want a computer to do something random like virtually roll a die 5 times? It’s easy enough to do this in real life, but how do you get truly random results from machine designed to not to leave output to chance?

 The need for truly random numbers goes well beyond degenerate gamblers looking to get their fix of virtual rolling dice. Online poker sites are such an attractive gambling option because the ‘randomness’ behind what cards are virtually dealt are a pretty reasonable substitute for the hot hands and bad beats of real poker. More practically, encryption technologies behind e-commerce security depend on the same pseudorandom-ness.

Just like millenials at Coachella pretending they know obscure (sometimes even fake) bands, computers do a good enough job of convincing you that their random numbers are in fact random. The most widely used random number generator is called the Mersenne Twister (which is coincidentally my new fantasy football team name). The Mersenne Twister is so-called because the linchpin of the algorithm is the number 2^199937 – 1, referred to as the period of the algorithm.  Mersenne Primes are prime numbers of the form 2^199937 – 1, named after the French Mathematician Marin Mersenne who studied them. The smallest of them is 2^2 – 1,=3, and the prime at the heart of the Mersenne Twister is one of the largest known of this form. Each iteration of the algorithm pops out a number between 0 and 2^199937 – 1 and divides it by 2^199937 – 1 so that the final output lies somewhere on the interval of 0 to 1. With such a large period, the algorithm could pop out pseudo-random numbers without repeating until the Sun burns out.

I tried following the algebra through several references but struggled to get through the bit-shifting parts of the algorithm. You can find the original paper here, but I found the NumPy implemention of random numbers to be a much more worthwhile lesson in theory and writing robust, extendable code. The algorithm has two parts; the first is the derivation of the next raw number from a linear recurrence, essentially finding out which number is next in line, followed by tempering which uses a series of bit shifts and masks to transform the raw bits into the final output.   In the randomkit.c file, the rk_random() function is doing all of the heavy lifting.

rk_random

Most of the other functions just wrap rk_random to perform other common random functions like choosing random integers or returning numbers from a Normal distribution.

Note the significance of the seed in generating random numbers. Throughout the post I’ve referred to ‘pseudo-random’ numbers because the seed dictates the order in which numbers are generated. If two people use the same random number generator with the same seed value set, they will generate the same numbers in the same order. Nothing about that last sentence implies randomness in the way that people usually think of it. The flaw of the Mersenne Twister and all random number generators are that if you observe enough of their numbers in succession you can infer the underlying seed of the generator and figure out the next pseudo-random number simply with pen and paper. This flaw is what makes the Mersenne Twister insecure for certain levels of cryptographic security.

What blows my mind is just how many areas of math and computer science this algorithm is connected to.

  • The validity of Monte Carlo simulations and many other statistical inference techniques depend on the randomness of the Mersenne Twister.
  • Mersenne primes and their foundations in number theory lie at the heart of this algorithm.
  • Even the ability to ‘randomly’ generate samples from the range of infinite rational numbers between 0 and 1 highlights the level of precision that computers can carry.

I really hope this first post hasn’t scared anyone too much. It’s so easy to rely on our computers for everything from personal finances to blogging without knowing the first thing about disk memory or CPUs, but there’s a lot of elegant machinery going on behind the scenes that deserves some overdue appreciation. It’s been tough looking for a paper on Mersenne Twisters that I can understand and even harder putting together a post that isn’t too deep to lose everyone, but still has enough substance to appreciate this modern idea of pseudo-randomness. Check back in a week for some more knowledge.

5 thoughts on “The Mersenne Twister

  1. Mark

    “Each iteration of the algorithm pops out a number between 0 and…” – Maybe I missed this, but how does the algorithm decide which number “pops out”?

    Like

    1. Hey Mark, thanks for the question. So the seed for this algorithm, usually an arbitrary value from 1-99, sets the initial state or a starting point from which the algorithm follows a discrete set of steps to find the next number. From there, the algorithm follows the same steps each iteration to get to a new state. Remember that numbers are guaranteed not to repeat for the length of the period.

      By default, some functions set the seed for you based on the system clock so you can call random(samples=10) twice, milliseconds apart, and two completely different sets of random numbers are sampled. Alternatively, if you and your friend each run the Mersenne twister on your own computers but both set the same seed (ie random(samples=10, seed=42) ), you will generate the same numbers. This determinism is completely opposite of what we consider random behavior, but keep in mind it’s not necessarily their predictability that’s important but moreso that the numbers the algorithm outputs follow a uniform distribution over the interval.

      As if this blog couldn’t get nerdier, I’ll try to better explain using the example of a rubiks cube. If you and your friend start out with rubiks cubes oriented exactly the same with each square matching (ie you start with the same seeds), performing the same twists on your respective cubes (ie turn the right face clockwise, front face counterclockwise, and so on) will leave you with matching cubes. However if you and your friend each scramble your cubes differently without telling each other (ie using different seeds), then performing the same moves as before will leave you completely different looking cubes.

      Like

  2. I’m not as versed in this area as you, but it seems like this essential ability to unravel the algorithm’s seed for these random number generators could lead to problems for gambling websites/casinos in terms of hacking and what not. Has this been an issue or is there really no easy/feasible way to do this?

    Like

    1. Observing enough random numbers from a generator definitely makes it possible to infer the seed and figure out which number is coming next, but we’re talking about observing a lot of numbers. I don’t know too much about the implementations online gambling sites use but I’m sure they mask how they translate random numbers on the uniform interval (0,1) to different cards in poker or spaces on the roulette wheel. It’s simple enough to change the seed after a deck of cards has been exhausted, and I don’t think 52 turns (or even 208 for a four-deck poker shoe) is enough to determine a seed and start predicting what’s coming next. At that point, you might do better off guessing based on what cards you’ve already seen come out than to solve a recurrence relation from 52 out of 2^199937-1 possible observations.

      Like

  3. “Each iteration of the algorithm pops out a number between 0 and 2^199937 – 1 and divides it by 2^199937 – 1. With such a large period, the algorithm could pop out pseudo-random numbers without repeating until the Sun burns out.”

    In terms of *output* MT19937 pops out 32-bit uints or 53-bit floats (from batching 2 iterations on 32-bit architecture). MT19937-64 produces 64-bit uints if you have a uniform 64-bit environment.

    So it’s impossible to have more than 2^64 iterations without repeating *a number that has been seen before*. It’s easy to measure this by generating 4 billion iterations of MT19937 (or far less depending on the seed) and deduping the results.

    The period, of course, refers to the *longest possible* run of numbers, including intra-sequence repeats, before the whole sequence repeats. With a seedable algorithm you don’t necessarily get the full period for a given seed.

    Maybe this was already what you meant, but it isn’t totally clear from the phrasing, and I know there’s a lot of confusion out there about repeating output vs. repeating sequence vs. period vs. full period.

    Cheers!

    Like

Leave a comment