Probability and Statistics

 Clumping of random numbers

 From: Gary Wong Address: gary@cs.arizona.edu Date: 30 September 1998 Subject: Re: Dice ("RANDOM' DICE?) Forum: rec.games.backgammon Google: wtyar1crad.fsf@brigantine.CS.Arizona.EDU

```Gregg Cattanach writes:
> One thing I've sometimes worried about with the dice generators at various
> servers, is whether they have a tendency to 'clump' rolls together in
> sequence, more than would be expected in a truly 'random' set of rolls.
> Almost every generator can demonstrate that over a large number of rolls,
> it rolls the 'correct' number of each combination.  But the pattern could
> be flawed, perhaps.  For example, rolling a 1 in every roll 10 times in a
> row, or not rolling a 4 in 15 rolls in a row, or many doubles in sequence.
>
> I'm sure this is a not trivial statistical analysis to decide if there is
> excessive 'clumping' from a generator, (how much 'clumping' should be
> expected??) but if it did exist, it could affect game results. Examples
> could be: dancing MANY times against a 3,4 point board, or consistently
> rolling a 1 during a bearoff.
>
> BTW, this post has nothing to do with NetGammon, just all dice generators
> in general.
>
> Any comments on the 'clumping' concept??

It's certainly a sensible question to ask, and though it's relatively
simple to test a specific hypothesis ("do I dance too many times on a 3
point board?"), testing vague ones ("are the dice fair?") gets very hard
very fast.

If you are interested in analysis of computer-generated pseudo-random
sequences, I suggest you browse the web, starting with:

http://www.npac.syr.edu/projects/random/other-info.html

In particular, the "clumping" concept you mention is measured by the
spectral test.  Loosely speaking, a spectral test in dimension n measures
the distribution of "n samples in a row" (ie. in dimension 2 it measures
pairs, in dimension 3 it measures triples, etc.)  It gets very expensive
to compute in high dimensions, but is quite useful in low (less than 10)
dimensions, because most "ordinary" random number generators tend to get
passable (but not spectacularly good) results in these dimensions.  That
makes it a simple test for easily weeding out bad generators.

A more strenuous theoretical requirement is for a generator to be
"k-distributed" (see Knuth's _The Art of Computer Programming_) -- a
sequence is 1-distributed if every number it generates occurs equally
often; 2-distributed if every _pair_ of numbers occurs equally often;
etc.  In a backgammon context, where each roll is considered a
single "number", 1-distribution would mean that all 36 rolls occurred
equally often and 2-distribution would mean all 1296 _pairs_ of rolls
occurred equally often.  The series "11 12 13 ... 65 66 11 12 13..."
is obviously 1-distributed, but not 2-distributed.  (Technically we
should say _k-distributed to n-bit accuracy_, where n-bit accuracy
means the output is in the range 0..(2^n)-1.  You can generally
achieve higher k-distribution by sacrificing accuracy, and vice
versa.)  k-distribution becomes overwhelmingly expensive to test
for large k and n (to test k-distribution to n-bit accuracy, you'd
need to test at least 2^(kn) numbers) so is generally shown theoretically
without performing actual measurements (as opposed to the spectral
test).  "Ordinary" random number generators generally cannot claim
any better than 1-distribution.

The best generator I know of (this is my favourite, so sorry if I keep
plugging it :-) is the Mersenne Twister (look for a description on the
web at http://www.math.keio.ac.jp/~matumoto/emt.html) which, if you
used it to generate numbers in the range 1..36 for use as backgammon
dice, would be 3115-distributed -- ie. there is no biased "clumping"
whatsoever for any sequence shorter than 3116 rolls!

Getting back to backgammon: accusations of biased dice are very common
(for some reason the dice are always biased AGAINST the complainer --
nobody ever posts to r.g.b. saying that they just beat Jellyfish in a
long match which they clearly deserved to lose, or that the dice on
Netgammon clearly favour them), but detailed analyses are rare.  A
couple of good examples:

Stephen Turner performs a chi-squared test of the "matrix" output of
10,000,000 FIBS rolls at http://www.dejanews.com/getdoc.xp?AN=230890310
and finds no evidence of bias.  This is loosely equivalent to showing the
FIBS dice pass the spectral test in dimension 2.  It also shows no
signs of 2-undistribution (you can't ever prove k-distribution just by
measuring random samples, but if a sequence was badly k-undistributed,
then you could find evidence of that).

Tom Keith gives a summary of 3,000,000 rolls generated by Motif at
http://www.bkgm.com/motif/stats.html and breaks them down by player
and position (overall, entering from the bar, and in races).

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
gary@cs.arizona.edu     http://www.cs.arizona.edu/~gary/
```

### Probability and Statistics

Average game and match length  (JP White, Dec 2000)
Average luck of each roll  (Timothy Chow+, Mar 2013)
Average luck of each roll  (Jørn Thyssen+, Feb 2004)
Calculating winning chances  (Douglas Zare, June 2000)
Chance of rolling x doubles in y rolls  (Raccoon+, July 2007)
Chance of rolling x or more pips in y rolls  (Tom Keith, Feb 2004)
Clumping of random numbers  (Gary Wong, Sept 1998)
Counting shots  (Koyunbaba+, June 2007)
Counting shots  (John Little+, Mar 2007)
Distribution of points per game  (Roland Sutter, June 1999)
Distribution of points per game  (Stig Eide+, Sept 1995)
Expected variation in points after a series of games  (Achim Müller+, Feb 1999)
How many games to decide who's better?  (Stephen Turner, Mar 1997)
How often is too often?  (Gary Wong, Oct 1998)
Losing after bearing off 14 checkers  (Daniel Murphy, July 1999)
Number of games per match  (Jason Lee+, Jan 2005)
Number of rolls to enter x checkers from bar  (Michael Depreli+, Mar 2011)
Visualizing odds  (Daithi, Mar 2011)