Computer Dice

Forum Archive : Computer Dice

Randomness testing

From:   Brett Meyer
Date:   12 December 2010
Subject:   Meyer Dice Tube VIDEO, Randomness & Cheating
Forum: Forums

There has been a multitude of comments and concerns expressed regarding the
use of the Meyer Dice Tube for Backgammon. One concern was whether the tube
indeed produces random results. I think most everyone is convinced at this
point that randomness is achieved. Several independent randomness studies
have been conducted, verifying the achievement of randomness. I believe
that Mr. Bob Koca also conducted his own study and agreed that the tube
produces random results (I'm sure he'll correct me if I'm wrong).

If you want to see some actual data, go to my site
( and click on the "Statistical Data" tab. These
are the raw data from a 3000-flips trial that my wife and I conducted
several years ago using a tube with no rods. If someone would like to
actually "crunch" the data and report how the good (or bad) the results
are statistically, that would be greatly appreciated. I have not processed
the data, but I can say just by looking at the results that the numbers
seem to be fairly tight.

Nick Kravitz  writes:


I am not sure if you are still watching this thread, but I did not see a
response to your request to test randomness in your results. I am a
quantitative analyst (quant) and decent backgammon player (at least I like
to think so) and am giving my opinion on your request for randomness
analysis below:

There are a few statistical methods to test for randomness. I have used
Pearson's chi-squared test, which is probably the best known and tested
method for doing so. (see's_chi-squared_test)
Before I give the results, here are some general comments on statistical
testing for people without a requisite background. (You can also find this
in any test on statistical testing or here:

In the same way that we cannot poll an entire voting population to predict
election results, we cannot roll dice infinite number of times to conclude
definitively whether they are fair. As such, statistical tests are not
fool-proof and set up to acknowledge the possibility of an incorrect
conclusion. For example, if we are testing the randomness of a single die
within the dice tube by rolling it 500 times, we might get the same number
on all tosses and conclude the dice tube was loaded. However it is still
possible (but extremely unlikely) that the tube was indeed fair and we
simply happened to be unlucky. In the event we roll 500 of the same number,
we can only state there is strong evidence (but no proof) we were rolling
with an unfair (non-random) dice tube. Likewise, our dice tube could be
loaded to roll more 1's than it should - for example, 1's with probability
50% and 2, 3, 4, 5, and 6 each with probability 10%. However when testing
this loaded dice tube, we might roll approximately equal frequencies of
each number by chance, in which case we would incorrect conclude the tube
was fair when in fact it was not. (side note: Dewey vs Truman was a famous
example of a statistical poll failing

By convention, statistical tests are most commonly set up to reject the
null hypothesis when it is in fact true with probability 5%, although this
value is arbitrary and purely conventional. We simply need some threshold
to start being suspicious of non-randomness. Sometimes a truly random
process produces results that look non-random; when this happens this is
called a type I error, or false positive, and would be equivalent to the
first case above (concluding the dice tube is non-random when in fact it is

The output of running the test on a set of random frequencies generated or
observed would be a "p-value" - which can be interpreted as probability of
obtaining a test statistic at least as extreme as the one that was actually
observed, assuming that the null hypothesis is true (in our case, that the
underlying process is indeed random).
By construction, we would expect that if we run many experiments of
throwing dice, the p-values would be uniform, and that given a truly random
process, over many experiments there would be a 5% chance we would return a
false positive. (Equivalently, our p-value would be between 0 and 0.05 with
5% probability)

Before I ran the test on the numbers from the Meyer website, I first ran
the test on my own numbers, which I generated from a computer program which
I am certain can produce unbiased random numbers. I generated 1000
experiments of 500 die rolls each. Most of the trials I got numbers that
looked random enough (for example, 94, 86, 89, 79, 77, 75) which returned a
p-value of 0.642. However, around 5% of the time (or around 50 times out of
1000) the numbers looked non-random enough to trip a false positive (for
example, 59, 84, 73, 98, 88, 98) which returned a p-value of 0.016. It is
actually a good thing to get some false positives; this indicates the test
is working as expected. If all 1000 experiments produced p-values above 5%,
I would be suspicious that the underlying random process was not working

Next, I applied the test to the numbers on the Meyer website, which
provided results for a total of 12 experiments, one for each starting
number for each die. If the rolls were truly random, we would expect the
results to look similar to the process described above that we know to be
random; i.e. p-values approximately uniform between 0 and 1 (in particular,
about half the p-values to be above 0.5 and half below, with maybe 1
observation close to or exceeding the 5% threshold of non-random suspicion)

The p-values I calculated ranged from 0.55 (least random, Blue 5) to 0.997
(most random, RED 1). These results look too good to be true. In fact, if
we rolled dice from a process we know in advance to be purely random (for
example, rolling a precision die, or having a computer generate random
numbers for us) the probability we would get results at least this good by
pure chance would be 0.0000143 (equivalently about 1 in 70,000) To put this
into backgammon perspective, there would be a better chance of your first 3
rolls coming out all double sizes (a mere 1 in 47,000)

I do not know how the experiment was run. Although there is nothing to
indicate that the results were somehow doctored, (or perhaps the most
random results "selected" from a larger set of experiments) due to the fact
the results look suspect, I would recommend having them re-sampled
independently by someone without an interest in the results of the test.

Brett Meyer  writes:

The data posted on my website of the 3000-flips randomness trial were the
result of my wife and I spending 3 days flipping a red and a blue die in a
9" tall, 2" ID acrylic tube with no baffles/rods. Therefore, the trial
emulated simply dropping 2 dice onto a flat surface from a height of 9",
with the only "randomizing" effect being the interaction of the 2 dice &
the contact/rebound with the interior surface of the tube.

Neither of us are statisticians, but the results looked fairly random to
us. The data was published so that those who are indeed statisticians could
crunch the numbers and provide an analysis of our results.
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     


Computer Dice

Dice on backgammon servers  (Hank Youngerman, July 2001) 
Does Agushak Backgammon cheat?  (Mr Nabutovsky, June 2000) 
Does BG by George cheat?  (George Sutty, Nov 1995) 
Does Backgammon NJ cheat?  (Greg+, June 2010) 
Does Cybergammon cheat?  (Goto Informatique, Aug 1996) 
Does David's Backgammon cheat?  (Joseph B. Calderone, June 1998) 
Does GNU Backgammon cheat?  (Robert-Jan Veldhuizen, Nov 2002) 
Does Gammontool cheat?  (Jim Hurley, Sept 1991) 
Does Hyper-Gammon cheat?  (ZZyzx, June 1996) 
Does Jellyfish cheat?  (Fredrik Dahl, June 1997) 
Does MVP Backgammon cheat?  (Mark Betz, Oct 1996) 
Does MonteCarlo cheat?  (Matt Reklaitis, June 1998) 
Does Motif cheat?  (Rick Kiesau+, Mar 2004)  [Long message]
Does Motif cheat?  (Billie Patterson, Feb 2003) 
Does Motif cheat?  (Robert D. Johnson, Oct 1996) 
Does Snowie cheat?  (André Nicoulin, Sept 1998) 
Does TD-Gammon cheat?  (Gerry Tesauro, Feb 1997) 
Error rates with computer dice  (NoChinDeluxe+, Feb 2011) 
FIBS: Analysis of 10 million rolls  (Stephen Turner, Apr 1997)  [Recommended reading]
FIBS: Are the dice biased?  (Kit Woolsey, Oct 1996) 
FIBS: Entering from the bar  (Tom Keith+, Apr 1997) 
GamesGrid: Too many jokers?  (Gregg Cattanach, Sept 2001) 
GridGammon: Are the dice random?  (leobueno+, Sept 2011) 
Jellyfish: How to check the dice  (John Goodwin, May 1998)  [Recommended reading]
Jellyfish: Proof it doesn't cheat  (Gary Wong, July 1998) 
MSN Zone: Security flaw  (happyjuggler0, June 2004) 
Official complaint form  (Gary Wong, June 1998)  [Recommended reading]
Randomness testing  (Brett Meyer+, Dec 2010) 
Safe Harbor Games dice  (Michael Petch+, Aug 2011) 
Synopsis of "cheating" postings  (Ray Karmo, Feb 2002) 
Testing for bias  (Kit Woolsey, Jan 1995) 
The dice sure seem unfair!  (Michael Sullivan, Apr 2004) 
Too many repeated rolls?  (Stephen Turner, Mar 1994) 
Winning and losing streaks  (Daniel Murphy, Mar 1998) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition

  Book Suggestions
Computer Dice
Cube Handling
Cube Handling in Races
Extreme Gammon
Fun and frustration
GNU Backgammon
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Source Code
Strategy--Bearing Off
Strategy--Checker play


Return to:  Backgammon Galore : Forum Archive Main Page