Forum Archive :
The following article is by Stig Eide; I'm posting it for him
because he doesn't have posting priveleges himself.
CAN WE TRUST THE ROLLOUTS?
In a time with an increasing number of backgammonprograms which plays
a decent game, we have got a powerful tool: The rollout feature.
I want to present a statistical tool that should follow any rollout:
The confidence interval.
What is a confidence interval? After you have performed a rollout,
you'll have an estimate of the probability of a 'success'. This can be
winning or losing. It doesn't matter. The confidence interval is
an interval with the estimate in the centre, and you'll know how sure
you can be that the probability is inside that interval.
n is the number of rollouts.
y is the number of 'successes' that occured during the rollout.
y/n is the estimated probability that a 'success' occures.
a is the deviance from the estimated probability y/n.
The confidence interval is (y/n-a,y/n+a).
z is chosen in order to tell the reliability of the confidence interval.
You can choose z to be any real number, and get any confidence interval
you want, but here is the 3 most used z's and their respective confidence
z=1.96 gives you a 95% confidence interval
z=2.17 gives you a 97% confidence interval
z=3 gives you a 99.74% confidence interval
So, if you choose z to be 1.96, then you can be 95% sure that the
probability of a success is between y/n-a and y/n+a.
You have performed 4000 rollouts of a position that occured during
a game. The computer tells you that if he had played both you and your
opponent, you would have won 3037 of those 6000 games (75.925%). You want
to make a confidence interval that is 97% reliable. The variables:
z is 2.17
y is 3037
n is 4000
a = z*sqr(y/n*(1-y/n)/n) = 2.17*sqr(3037/4000*(1-3037/4000)/4000) = 0.0128
The 97% confidence interval is now (0.75925-0.0128,0.75925+0.0128) or
(0.746,0.772). This means that you can be 97% sure that the chance of
winning the position is between 74.6% and 77.2%. If you want to claim
that this position is either a drop or a take, you have to perform a
new rollout, with more than 4000 rollouts, because that will narrow down
the confidence interval (give you a smaller a).
Stig Eide (firstname.lastname@example.org)
- Advice (David Montgomery, Apr 1996)
- Cautionary tale (Kit Woolsey, Sept 1995)
- Combining rollouts (Gregg Cattanach+, Dec 2003)
- Confidence intervals (Bob Koca, Nov 2010)
- Confidence intervals (Timothy Chow, May 2010)
- Confidence intervals (Gerry Tesauro, Feb 1994)
- Cubeless vs centered-cube rollouts (Ron Karr, Dec 1997)
- Duplicate dice (David Montgomery, June 1998)
- How reliable are rollouts? (David Montgomery, Aug 1999)
- Level-5 versus level-6 rollouts (Michael J. Zehr, June 1998)
- Level-5 versus level-6 rollouts (Chuck Bower, Aug 1997)
- Positions with inaccurate rollouts (Douglas Zare, Oct 2002)
- Reporting results of rollouts (David Montgomery, June 1995)
- Rollout settings (Lokicol+, Apr 2010)
- Settlement limit (Michael J. Zehr, Apr 1998)
- Settlement limit (Kit Woolsey, Dec 1997)
- Settlement limit in races (Alexander Nitschke, Dec 1997)
- Some guidelines (Kit Woolsey, Apr 1996)
- Standard error and JSD (rambiz+, Feb 2011)
- Standard error and JSD (Stick+, Oct 2007)
- Systematic error (Chuck Bower, Oct 1996)
- Tips for doing rollouts (Douglas Zare, June 2002)
- Truncated rollouts (Gregg Cattanach, Oct 2002)
- Truncated rollouts: pros and cons (Jason Lee+, Jan 2006)
- What is a rollout? (Gregg Cattanach, Dec 1999)