Gregg Cattanach writes:
> One thing I've sometimes worried about with the dice generators at various
> servers, is whether they have a tendency to 'clump' rolls together in
> sequence, more than would be expected in a truly 'random' set of rolls.
> Almost every generator can demonstrate that over a large number of rolls,
> it rolls the 'correct' number of each combination. But the pattern could
> be flawed, perhaps. For example, rolling a 1 in every roll 10 times in a
> row, or not rolling a 4 in 15 rolls in a row, or many doubles in sequence.
>
> I'm sure this is a not trivial statistical analysis to decide if there is
> excessive 'clumping' from a generator, (how much 'clumping' should be
> expected??) but if it did exist, it could affect game results. Examples
> could be: dancing MANY times against a 3,4 point board, or consistently
> rolling a 1 during a bearoff.
>
> BTW, this post has nothing to do with NetGammon, just all dice generators
> in general.
>
> Any comments on the 'clumping' concept??
It's certainly a sensible question to ask, and though it's relatively
simple to test a specific hypothesis ("do I dance too many times on a 3
point board?"), testing vague ones ("are the dice fair?") gets very hard
very fast.
If you are interested in analysis of computergenerated pseudorandom
sequences, I suggest you browse the web, starting with:
http://www.npac.syr.edu/projects/random/otherinfo.html
http://random.mat.sbg.ac.at/links/
In particular, the "clumping" concept you mention is measured by the
spectral test. Loosely speaking, a spectral test in dimension n measures
the distribution of "n samples in a row" (ie. in dimension 2 it measures
pairs, in dimension 3 it measures triples, etc.) It gets very expensive
to compute in high dimensions, but is quite useful in low (less than 10)
dimensions, because most "ordinary" random number generators tend to get
passable (but not spectacularly good) results in these dimensions. That
makes it a simple test for easily weeding out bad generators.
A more strenuous theoretical requirement is for a generator to be
"kdistributed" (see Knuth's _The Art of Computer Programming_)  a
sequence is 1distributed if every number it generates occurs equally
often; 2distributed if every _pair_ of numbers occurs equally often;
etc. In a backgammon context, where each roll is considered a
single "number", 1distribution would mean that all 36 rolls occurred
equally often and 2distribution would mean all 1296 _pairs_ of rolls
occurred equally often. The series "11 12 13 ... 65 66 11 12 13..."
is obviously 1distributed, but not 2distributed. (Technically we
should say _kdistributed to nbit accuracy_, where nbit accuracy
means the output is in the range 0..(2^n)1. You can generally
achieve higher kdistribution by sacrificing accuracy, and vice
versa.) kdistribution becomes overwhelmingly expensive to test
for large k and n (to test kdistribution to nbit accuracy, you'd
need to test at least 2^(kn) numbers) so is generally shown theoretically
without performing actual measurements (as opposed to the spectral
test). "Ordinary" random number generators generally cannot claim
any better than 1distribution.
The best generator I know of (this is my favourite, so sorry if I keep
plugging it :) is the Mersenne Twister (look for a description on the
web at http://www.math.keio.ac.jp/~matumoto/emt.html) which, if you
used it to generate numbers in the range 1..36 for use as backgammon
dice, would be 3115distributed  ie. there is no biased "clumping"
whatsoever for any sequence shorter than 3116 rolls!
Getting back to backgammon: accusations of biased dice are very common
(for some reason the dice are always biased AGAINST the complainer 
nobody ever posts to r.g.b. saying that they just beat Jellyfish in a
long match which they clearly deserved to lose, or that the dice on
Netgammon clearly favour them), but detailed analyses are rare. A
couple of good examples:
Stephen Turner performs a chisquared test of the "matrix" output of
10,000,000 FIBS rolls at http://www.dejanews.com/getdoc.xp?AN=230890310
and finds no evidence of bias. This is loosely equivalent to showing the
FIBS dice pass the spectral test in dimension 2. It also shows no
signs of 2undistribution (you can't ever prove kdistribution just by
measuring random samples, but if a sequence was badly kundistributed,
then you could find evidence of that).
Tom Keith gives a summary of 3,000,000 rolls generated by Motif at
http://www.bkgm.com/motif/stats.html and breaks them down by player
and position (overall, entering from the bar, and in races).
Cheers,
Gary.

Gary Wong, Department of Computer Science, University of Arizona
gary@cs.arizona.edu http://www.cs.arizona.edu/~gary/
