 Programming Variance reduction of rollouts

 From: Michael J. Zehr Address: tada@mit.edu Date: 24 August 1998 Subject: Re: Rollouts and variance reduction Forum: rec.games.backgammon Google: 6rrum2\$l2d@senator-bedfellow.MIT.EDU

```Midas wrote:
> Hi there.  Could one of you statistically minded readers
> let me know if there are many different ways to apply variance
> reduction to backgammon rollouts.

I can trivially answer "yes" to this but in a way that might not be
useful to you.  Jellyfish uses one method (the name of the algorithm is
in the manual) and Snowie uses a method that might or might not be the
same.  (I'll speculate that it isn't the same since Snowie generally
reports a higher "equivalent to X games" value for the same size
rollout, which suggests that it reduces variance faster.)

In addition there's at least one variance reduction method that isn't
used by any computer programs that I am aware of.  This is the (hereby
named, although reported on this newsgroup previously) Zehr
Cube-reduction algorithm:  when rolling out with a live cube and a
takeable cube is given, keep the cube at the same value and play two
games from the position where the cube was turned.  (Do this on all
subsequent cube turnings too.)  The purpose of this is to allow a live
cube on rollouts and yet keep a single game with a high cube value from
affecting the results too much.

> I.e. is there more than one formula and if so, is
> it possible they may result in differing results
> significant enough to matter when relating to
> backgammon check play analysis.

I'm sure one can use different algorithms to get different results from
the same neural net and dice, however one would also get different
results from different neural nets when using the same algorithm.

The basic idea of any variance reduction method is to estimate the
amount of luck each side has and factor that out.  For example, if one
side has two on the bar against a 5-point board and rolls the necessary
doubles to come in that's a very lucky roll.  A variance reduction
algorithm would measure that luck and change the results of the game in
some way to take account of it.

In the example above, if the neural net's evaluation of rolling the
double to enter is +.243 equity, and the evaluation of the position
before rolling as -.324, that roll was worth +.567.  If the player who
rolled the lucky double goes on to win then instead of getting 1 point
for the win they might get 1 point less some fraction of the equity from
their lucky rolls.  So you might deduct half of .567 (as well as half of
other lucky rolls, but also adding in half of the unlucky rolls, and
doing the same thing for the opponents).

Some variations on this might include:

o vary the fraction of the adjustment (half in the example above)

o only make the adjustment if the roll was especially lucky or unlucky
and make no adjustments for rolls whose value is within some range of
the position's value before rolling

o change the adjustment based on whether the roll was close to the start
of the game or close to the end of the game

Any such algorithm is going to have results vary depending on the
evaluation used to decide what is a lucky roll.  In the example above a
different program rolling out the same position with the same dice might
think that rolling the double was worth only .425 and hence make a

An example of another kind of variance reduction is one that Snowie uses
-- you can enable bearoff database truncation.  When using this a
rollout stops at the point when Snowie can look it up in its bearoff
database.  This database is extremely accurate and can reduce all of the
variance from luck during the bearoff.  For example, Snowie might
determine that one side has a 63% chance of winning.  At that point it
can stop that game and use a result of .26 equity.

> Also with regards to checker play comparison,
> should we lean towards truncated rollouts with a set
> horizon with no variance reduction or are the results
> likely to better when applying
> variance reduction a full rollout?

First one should specify the context a bit more.  A full rollout will
almost always give a better evaluation if time were not a factor.  What
one is usually interested in is the fastest way to get a reliable

For some positions a truncated rollout will get a reliable answer
fastest and for some it won't.  If the position is one that the
evaluator understand well, or the position is one that will soon
simplify (for example a holding game) then a truncated rollout can get a
very reliable answer very quickly.  One can think of these as
evaluations with a higher ply.  (A 1296 game rollout with the initial
rolls varied, truncated after 2 rolls and played n-ply is equal to an
n+2-ply evaluation.)

If the evaluator doesn't understand the position well or nothing is
going to change much for a while (for example a prime vs. prime
position) then it is probably best to do the full rollout.

The best way to decide which one to do is to start with a full rollout,
compare the results to the evaluator, and if they are close then trust a
truncated rollout.  Unfortunately that doesn't save any time.  But with
experience one can judge which kind of rollout to do.

One final comment about computer rollouts -- in those positions in
which a truncated rollout is not accurate we might not be able to trust
the full rollout results anyway.  If the evaluator doesn't understand
the position very well, it might not even understand it well enough to
do a rollout.  It's easy to fall into the trap of doing a bazillion game
rollout, getting a standard deviation of .001 on the results, and
believing you've found the right answer.  In reality you've found the
best answer that particular tool can give you, which isn't necessarily

-Michael J. Zehr
```

 Brian Sheppard  writes: ```One significant technique [of variance reduction] is called "striated sampling." The goal is to choose your samples according to a theoretical model of how samples ought to be distributed. For example, when you do a JF rollout with a multiple of 36 trials, JF allocates trials equally over the 36 possible dice rolls. This ensures that the variations exactly match the results of the first ply of rolls. And if the number of trials is a multiple of 1296, then JF will do an exhaustive enumeration out to 2 ply. Another technique is to concentrate effort on those rolls that have the highest variance. For example, suppose that the side-to-move is on the bar, and 25 numbers fan. Then all 25 of those numbers transpose to the same variation. Moreover, suppose that we observe the results of fanning to be consistent: the opponent always wins. Accordingly, this category has 0 variance. Since other rolls have higher variance, we should concentrate our efforts on other rolls. To compensate for the fact that we distribute trials non-uniformly, we scale the results for each roll. To continue our example, suppose we do 50 trials of the 25 fanning rolls (losing a single point each time), and 1000 trials for each of the other 11 rolls, losing 0.5 points each time. Our equity is computed as follows: average value in 50 trials is -1, times 25/36 = -25/36. average value in 11000 trials is -0.5, times 11/36 = -5.5/36. Total value: -30.5/36 -- Michael J Zehr wrote: > In addition there's at least one variance reduction method that isn't > used by any computer programs that I am aware of. This is the (hereby > named, although reported on this newsgroup previously) Zehr > Cube-reduction algorithm: when rolling out with a live cube and a > takeable cube is given, keep the cube at the same value and play two > games from the position where the cube was turned. (Do this on all > subsequent cube turnings too.) The purpose of this is to allow a live > cube on rollouts and yet keep a single game with a high cube value from > affecting the results too much. This is clever. Doubling the number of trials in variations in which a cube is accepted reduces the variance of our estimate in those variations by 50%. The impact on run time is reasonable. Even though we double the double/take trial count, we do not double the run-time, since the variation is only split *after* the double. Moves prior to the double are common to the two trials. The bottom line is that the method halves the variance of double/take trials with considerably less than a doubling of the run-time. A few implementation details for implementation-minded readers. First, there are pathological situations where the rollout would explode. (These situations are the same as the situations where the cube-using equity value is undefined. Check out the Deja News archive for related threads.) The number of "splits" of the variation tree must be limited to prevent explosions. Second, you have weigh the double/take variations on an equal footing with the double/drop variations. This means that the pair of double/take games must be counted as a single game, not two games. -- Michael J Zehr wrote: > The basic idea of any variance reduction method is to estimate the > amount of luck each side has and factor that out. For example, if one > side has two on the bar against a 5-point board and rolls the necessary > doubles to come in that's a very lucky roll. A variance reduction > algorithm would measure that luck and change the results of the game in > some way to take account of it. > > In the example above, if the neural net's evaluation of rolling the > double to enter is +.243 equity, and the evaluation of the position > before rolling as -.324, that roll was worth +.567. If the player who > rolled the lucky double goes on to win then instead of getting 1 point > for the win they might get 1 point less some fraction of the equity from > their lucky rolls. So you might deduct half of .567 (as well as half of > other lucky rolls, but also adding in half of the unlucky rolls, and > doing the same thing for the opponents). Whoa! This is not "variance reduction." This is "bias introduction." (Math phobic readers may wish to avert their eyes at this juncture.) The algorithm you just described computes the move-by-move change in the neural net evaluation over the course of a single trial. The final outcome of the trial is offset against the sum of the changes. This reduces variance, to be sure. But *how* does it reduce variance? Let's take a look. Let E0 be the evaluation of the root position of the rollout. After 1 turn the evaluation changes to E1, then to E2, and so on. In the final position the outcome is V. What is the sum of the changes from turn to turn? In the first turn the change is E0 - E1. In the second turn the change is E1 - E2. In the third turn the change is E2 - E3, and so on. The sum is (E0 - E1) + (E1 - E2) + (E2 - E3) + (E3 - E4) + ... + (En - V) This type of sum is called a "telescoping series," because it resembles a collapsible telescope. To see why, let's rearrange the parentheses a little: E0 (-E1 + E1) (-E2 + E2) (-E3 + E3) ... (-En + En) - V So the sum is E0 - V! This amount will be offset against V. If the fraction we offset against V is f, then the final answer is f * E0 + (1 - f) * V Well no wonder we reduce the variance! All the algorithm accomplishes is to compute a weighted average of the true statistical answer with our initial estimate. This is obviously the wrong answer; our goal is to eliminate our initial estimate from the equation. The dependency on E0 is a bias. True variance reduction does not introduce bias into the result. I believe I have reconstructed JF's algorithm based upon the cryptic description in JF's user manual. Before I describe it, I want to provide some motivation for it, so that you can understand how the algorithm reduces variance. Let E be the true cubeless equity function, and e be our static evaluation function (e.g. a neural net). Rollouts estimate E by playing out a number of trial games and averaging the results. A simple rollout does not exploit the fact that e is a very good approxmation to E. Variance reduction reduces the number of trials we need to achieve a specific level of accuracy by exploiting the fact that e is a very good approximation of E. How does it work? Instead of regarding the rollout as establishing a value for E, let's think of it as establishing a value for E - e. In words, the rollout will estimate the error in our static evaluation. Let's begin by doing a 1-ply search, then estimate E as the average of the 1- ply values of e. This would be completely correct if e were perfect, but since e isn't perfect, we need to estimate the error involved. How can we do that? Simple. Let's just pick a 1-ply position at random, and estimate the error involved in its evaluation. Note that I have just recreated the problem that I posed at the root (i.e., estimating the difference between e and E) but one ply deeper into the game tree. I can continue this process until I reach a terminal node. To estimate E, we start at the end of the variation we have selected at random and apply the formula E = Delta + Average(e) where the average is taken over all next-ply nodes and Delta is the recursively computed estimate of the inaccuracy of e for the randomly-selected successor. How does this procedure reduce variance? Let's take a look. The variance of our estimate of E equals the variance of our estimate of Delta plus the variance of our estimate of Average(e). But Average(e) is a constant, so its variance is zero. So the variance of our estimate of E is simply the variance of our estimate of Delta. Now, Delta is the estimate of E-e for a single successor position. And since e is very, very accurate, the variance of Delta is very, very low. The standard deviation of E-e is 0.035 or so. Compare this low error rate with the typical standard deviation of 0.5 for a single game rollout. Reducing the standard deviation by a factor of 0.5 / 0.035 = 14.28 would normally require 14.28 * 14.28 = 203.9 times as many trials. We need vastly fewer trials to get a specific level of accuracy. The factor of 203.9 is not pure profit, alas, because we have to do 1-ply searches at every node when we use variance reduction. By contrast, a normal rollout does only static evaluation. This typically costs a factor of 20 or so, which is a shame. But we still should be very happy with a net increase of a factor of 10 in overall throughput. As I stated before, I believe that this is the algorithm used in JF. However, this belief is based upon deciphering the user's manual, which refers only to certain theorems from the theory of Markov chains. There may be other formulations of this algorithm, or there may be other ways to apply the theory to reduce variance. Warm Regards, Brian Sheppard ```

### Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997)
Anticomputer positions  (Bill Taylor+, June 1998)
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006)
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992)
BKG beats world champion  (Marty Storer, Sept 1991)
Backgames  (David Montgomery+, June 1998) Blockading feature  (Sam Pottle+, Feb 1999) Board encoding for neural network  (Brian Sheppard, Feb 1997)
Bot weaknesses  (Douglas Zare, Mar 2003)
Building and training a neural-net player  (Brian Sheppard, Aug 1998)
How to count plies?  (Chuck Bower+, Jan 2004) How to count plies?  (tanglebear+, Mar 2003)
Ideas for improving computer play  (David Montgomery, Feb 1994)
Ideas on computer players  (Brian Sheppard, Feb 1997)
Introduction  (Gareth McCaughan, Oct 1994)
Measuring Difficulty  (John Robson+, Feb 2005) Methods of encoding positions  (Gary Wong, Jan 2001)
N-ply algorithm  (eXtreme Gammon, Jan 2011)
Neural net questions  (Brian Sheppard, Mar 1999)
Pruning the list of moves  (David Montgomery+, Feb 1994)
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004)
Source code  (Gary Wong, Dec 1999)
TD-Gammon vs. Robertie  (David Escoffery, June 1992)
Training for different gammon values  (Gerry Tesauro, Feb 1996)
Training neural nets  (Walter Trice, Nov 2000)
Variance reduction in races  (David Montgomery+, Dec 1998) Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998) Variance reduction of rollouts  (Jim Williams, June 1997)
What is a "neural net"?  (Gary Wong, Oct 1998)
Writing a backgammon program  (Gary Wong, Jan 1999) From GammOnLine Long message Recommended reading Recent addition

 Book Suggestions Books Cheating Chouettes Computer Dice Cube Handling Cube Handling in Races Equipment Etiquette Extreme Gammon Fun and frustration GNU Backgammon History Jellyfish Learning Luck versus Skill Magazines & E-zines Match Archives Match Equities Match Play Match Play at 2-away/2-away Miscellaneous Opening Rolls Pip Counting Play Sites Probability and Statistics Programming Propositions Puzzles Ratings Rollouts Rules Rulings Snowie Software Source Code Strategy--Backgames Strategy--Bearing Off Strategy--Checker play Terminology Theory Tournaments Uncategorized Variations