Backgammon Computer Dice

Computer Dice

Randomness testing

From:   Brett Meyer
Address:   mrbmeyer@comcast.net
Date:   12 December 2010
Subject:   Meyer Dice Tube VIDEO, Randomness & Cheating
Forum:   BGonline.org Forums

There has been a multitude of comments and concerns expressed regarding the use of the Meyer Dice Tube for Backgammon. One concern was whether the tube indeed produces random results. I think most everyone is convinced at this point that randomness is achieved. Several independent randomness studies have been conducted, verifying the achievement of randomness. I believe that Mr. Bob Koca also conducted his own study and agreed that the tube produces random results (I'm sure he'll correct me if I'm wrong). If you want to see some actual data, go to my site (http://www.BrettMeyer.com) and click on the "Statistical Data" tab. These are the raw data from a 3000-flips trial that my wife and I conducted several years ago using a tube with no rods. If someone would like to actually "crunch" the data and report how the good (or bad) the results are statistically, that would be greatly appreciated. I have not processed the data, but I can say just by looking at the results that the numbers seem to be fairly tight.

Nick Kravitz writes:

Brett, I am not sure if you are still watching this thread, but I did not see a response to your request to test randomness in your results. I am a quantitative analyst (quant) and decent backgammon player (at least I like to think so) and am giving my opinion on your request for randomness analysis below: There are a few statistical methods to test for randomness. I have used Pearson's chi-squared test, which is probably the best known and tested method for doing so. (see http://en.wikipedia.org/wiki/Pearson's_chi-squared_test) Before I give the results, here are some general comments on statistical testing for people without a requisite background. (You can also find this in any test on statistical testing or here: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing) In the same way that we cannot poll an entire voting population to predict election results, we cannot roll dice infinite number of times to conclude definitively whether they are fair. As such, statistical tests are not fool-proof and set up to acknowledge the possibility of an incorrect conclusion. For example, if we are testing the randomness of a single die within the dice tube by rolling it 500 times, we might get the same number on all tosses and conclude the dice tube was loaded. However it is still possible (but extremely unlikely) that the tube was indeed fair and we simply happened to be unlucky. In the event we roll 500 of the same number, we can only state there is strong evidence (but no proof) we were rolling with an unfair (non-random) dice tube. Likewise, our dice tube could be loaded to roll more 1's than it should - for example, 1's with probability 50% and 2, 3, 4, 5, and 6 each with probability 10%. However when testing this loaded dice tube, we might roll approximately equal frequencies of each number by chance, in which case we would incorrect conclude the tube was fair when in fact it was not. (side note: Dewey vs Truman was a famous example of a statistical poll failing http://en.wikipedia.org/wiki/Dewey_Defeats_Truman) By convention, statistical tests are most commonly set up to reject the null hypothesis when it is in fact true with probability 5%, although this value is arbitrary and purely conventional. We simply need some threshold to start being suspicious of non-randomness. Sometimes a truly random process produces results that look non-random; when this happens this is called a type I error, or false positive, and would be equivalent to the first case above (concluding the dice tube is non-random when in fact it is fair) The output of running the test on a set of random frequencies generated or observed would be a "p-value" - which can be interpreted as probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true (in our case, that the underlying process is indeed random). http://en.wikipedia.org/wiki/P-value By construction, we would expect that if we run many experiments of throwing dice, the p-values would be uniform, and that given a truly random process, over many experiments there would be a 5% chance we would return a false positive. (Equivalently, our p-value would be between 0 and 0.05 with 5% probability) Before I ran the test on the numbers from the Meyer website, I first ran the test on my own numbers, which I generated from a computer program which I am certain can produce unbiased random numbers. I generated 1000 experiments of 500 die rolls each. Most of the trials I got numbers that looked random enough (for example, 94, 86, 89, 79, 77, 75) which returned a p-value of 0.642. However, around 5% of the time (or around 50 times out of 1000) the numbers looked non-random enough to trip a false positive (for example, 59, 84, 73, 98, 88, 98) which returned a p-value of 0.016. It is actually a good thing to get some false positives; this indicates the test is working as expected. If all 1000 experiments produced p-values above 5%, I would be suspicious that the underlying random process was not working correctly. Next, I applied the test to the numbers on the Meyer website, which provided results for a total of 12 experiments, one for each starting number for each die. If the rolls were truly random, we would expect the results to look similar to the process described above that we know to be random; i.e. p-values approximately uniform between 0 and 1 (in particular, about half the p-values to be above 0.5 and half below, with maybe 1 observation close to or exceeding the 5% threshold of non-random suspicion) The p-values I calculated ranged from 0.55 (least random, Blue 5) to 0.997 (most random, RED 1). These results look too good to be true. In fact, if we rolled dice from a process we know in advance to be purely random (for example, rolling a precision die, or having a computer generate random numbers for us) the probability we would get results at least this good by pure chance would be 0.0000143 (equivalently about 1 in 70,000) To put this into backgammon perspective, there would be a better chance of your first 3 rolls coming out all double sizes (a mere 1 in 47,000) I do not know how the experiment was run. Although there is nothing to indicate that the results were somehow doctored, (or perhaps the most random results "selected" from a larger set of experiments) due to the fact the results look suspect, I would recommend having them re-sampled independently by someone without an interest in the results of the test.

Brett Meyer writes:

The data posted on my website of the 3000-flips randomness trial were the result of my wife and I spending 3 days flipping a red and a blue die in a 9" tall, 2" ID acrylic tube with no baffles/rods. Therefore, the trial emulated simply dropping 2 dice onto a flat surface from a height of 9", with the only "randomizing" effect being the interaction of the 2 dice & the contact/rebound with the interior surface of the tube. Neither of us are statisticians, but the results looked fairly random to us. The data was published so that those who are indeed statisticians could crunch the numbers and provide an analysis of our results.

Did you find the information in this article useful?

Do you have any comments you'd like to add?

Computer Dice

Dice on backgammon servers (Hank Youngerman, July 2001)

Does Agushak Backgammon cheat? (Mr Nabutovsky, June 2000)

Does BG by George cheat? (George Sutty, Nov 1995)

Does Backgammon NJ cheat? (Greg+, June 2010)

Does Cybergammon cheat? (Goto Informatique, Aug 1996)

Does David's Backgammon cheat? (Joseph B. Calderone, June 1998)

Does GNU Backgammon cheat? (Robert-Jan Veldhuizen, Nov 2002)

Does Gammontool cheat? (Jim Hurley, Sept 1991)

Does Hyper-Gammon cheat? (ZZyzx, June 1996)

Does Jellyfish cheat? (Fredrik Dahl, June 1997)

Does MVP Backgammon cheat? (Mark Betz, Oct 1996)

Does MonteCarlo cheat? (Matt Reklaitis, June 1998)

Does Motif cheat? (Rick Kiesau+, Mar 2004) [Long message]

Does Motif cheat? (Billie Patterson, Feb 2003)

Does Motif cheat? (Robert D. Johnson, Oct 1996)

Does Snowie cheat? (André Nicoulin, Sept 1998)

Does TD-Gammon cheat? (Gerry Tesauro, Feb 1997)

Error rates with computer dice (NoChinDeluxe+, Feb 2011)

FIBS: Analysis of 10 million rolls (Stephen Turner, Apr 1997) [Recommended reading]

FIBS: Are the dice biased? (Kit Woolsey, Oct 1996)

FIBS: Entering from the bar (Tom Keith+, Apr 1997)

GamesGrid: Too many jokers? (Gregg Cattanach, Sept 2001)

GridGammon: Are the dice random? (leobueno+, Sept 2011)

Jellyfish: How to check the dice (John Goodwin, May 1998) [Recommended reading]

Jellyfish: Proof it doesn't cheat (Gary Wong, July 1998)

MSN Zone: Security flaw (happyjuggler0, June 2004)

Official complaint form (Gary Wong, June 1998) [Recommended reading]

Randomness testing (Brett Meyer+, Dec 2010)

Safe Harbor Games dice (Michael Petch+, Aug 2011)

Synopsis of "cheating" postings (Ray Karmo, Feb 2002)

Testing for bias (Kit Woolsey, Jan 1995)

The dice sure seem unfair! (Michael Sullivan, Apr 2004)

Too many repeated rolls? (Stephen Turner, Mar 1994)

Winning and losing streaks (Daniel Murphy, Mar 1998)

[GammOnLine forum] From GammOnLine [Long message] Long message [Recommended reading] Recommended reading [Recent addition] Recent addition

Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

Return to: Backgammon Galore : Forum Archive Main Page