Forum Archive :
Programming
Variance reduction of rollouts

Midas wrote:
> Hi there. Could one of you statistically minded readers
> let me know if there are many different ways to apply variance
> reduction to backgammon rollouts.
I can trivially answer "yes" to this but in a way that might not be
useful to you. Jellyfish uses one method (the name of the algorithm is
in the manual) and Snowie uses a method that might or might not be the
same. (I'll speculate that it isn't the same since Snowie generally
reports a higher "equivalent to X games" value for the same size
rollout, which suggests that it reduces variance faster.)
In addition there's at least one variance reduction method that isn't
used by any computer programs that I am aware of. This is the (hereby
named, although reported on this newsgroup previously) Zehr
Cubereduction algorithm: when rolling out with a live cube and a
takeable cube is given, keep the cube at the same value and play two
games from the position where the cube was turned. (Do this on all
subsequent cube turnings too.) The purpose of this is to allow a live
cube on rollouts and yet keep a single game with a high cube value from
affecting the results too much.
> I.e. is there more than one formula and if so, is
> it possible they may result in differing results
> significant enough to matter when relating to
> backgammon check play analysis.
I'm sure one can use different algorithms to get different results from
the same neural net and dice, however one would also get different
results from different neural nets when using the same algorithm.
The basic idea of any variance reduction method is to estimate the
amount of luck each side has and factor that out. For example, if one
side has two on the bar against a 5point board and rolls the necessary
doubles to come in that's a very lucky roll. A variance reduction
algorithm would measure that luck and change the results of the game in
some way to take account of it.
In the example above, if the neural net's evaluation of rolling the
double to enter is +.243 equity, and the evaluation of the position
before rolling as .324, that roll was worth +.567. If the player who
rolled the lucky double goes on to win then instead of getting 1 point
for the win they might get 1 point less some fraction of the equity from
their lucky rolls. So you might deduct half of .567 (as well as half of
other lucky rolls, but also adding in half of the unlucky rolls, and
doing the same thing for the opponents).
Some variations on this might include:
o vary the fraction of the adjustment (half in the example above)
o only make the adjustment if the roll was especially lucky or unlucky
and make no adjustments for rolls whose value is within some range of
the position's value before rolling
o change the adjustment based on whether the roll was close to the start
of the game or close to the end of the game
Any such algorithm is going to have results vary depending on the
evaluation used to decide what is a lucky roll. In the example above a
different program rolling out the same position with the same dice might
think that rolling the double was worth only .425 and hence make a
smaller adjustment.
An example of another kind of variance reduction is one that Snowie uses
 you can enable bearoff database truncation. When using this a
rollout stops at the point when Snowie can look it up in its bearoff
database. This database is extremely accurate and can reduce all of the
variance from luck during the bearoff. For example, Snowie might
determine that one side has a 63% chance of winning. At that point it
can stop that game and use a result of .26 equity.
> Also with regards to checker play comparison,
> should we lean towards truncated rollouts with a set
> horizon with no variance reduction or are the results
> likely to better when applying
> variance reduction a full rollout?
First one should specify the context a bit more. A full rollout will
almost always give a better evaluation if time were not a factor. What
one is usually interested in is the fastest way to get a reliable
answer.
For some positions a truncated rollout will get a reliable answer
fastest and for some it won't. If the position is one that the
evaluator understand well, or the position is one that will soon
simplify (for example a holding game) then a truncated rollout can get a
very reliable answer very quickly. One can think of these as
evaluations with a higher ply. (A 1296 game rollout with the initial
rolls varied, truncated after 2 rolls and played nply is equal to an
n+2ply evaluation.)
If the evaluator doesn't understand the position well or nothing is
going to change much for a while (for example a prime vs. prime
position) then it is probably best to do the full rollout.
The best way to decide which one to do is to start with a full rollout,
compare the results to the evaluator, and if they are close then trust a
truncated rollout. Unfortunately that doesn't save any time. But with
experience one can judge which kind of rollout to do.
One final comment about computer rollouts  in those positions in
which a truncated rollout is not accurate we might not be able to trust
the full rollout results anyway. If the evaluator doesn't understand
the position very well, it might not even understand it well enough to
do a rollout. It's easy to fall into the trap of doing a bazillion game
rollout, getting a standard deviation of .001 on the results, and
believing you've found the right answer. In reality you've found the
best answer that particular tool can give you, which isn't necessarily
the right answer.
Michael J. Zehr


Brian Sheppard writes:
One significant technique [of variance reduction] is called "striated
sampling." The goal is to choose your samples according to a theoretical
model of how samples ought to be distributed.
For example, when you do a JF rollout with a multiple of 36 trials, JF
allocates trials equally over the 36 possible dice rolls. This ensures that
the variations exactly match the results of the first ply of rolls. And if
the number of trials is a multiple of 1296, then JF will do an exhaustive
enumeration out to 2 ply.
Another technique is to concentrate effort on those rolls that have the
highest variance. For example, suppose that the sidetomove is on the bar,
and 25 numbers fan. Then all 25 of those numbers transpose to the same
variation. Moreover, suppose that we observe the results of fanning to be
consistent: the opponent always wins. Accordingly, this category has 0
variance. Since other rolls have higher variance, we should concentrate our
efforts on other rolls.
To compensate for the fact that we distribute trials nonuniformly, we
scale the results for each roll. To continue our example, suppose we do 50
trials of the 25 fanning rolls (losing a single point each time), and 1000
trials for each of the other 11 rolls, losing 0.5 points each time. Our
equity is computed as follows:
average value in 50 trials is 1, times 25/36 = 25/36.
average value in 11000 trials is 0.5, times 11/36 = 5.5/36.
Total value: 30.5/36

Michael J Zehr wrote:
> In addition there's at least one variance reduction method that isn't
> used by any computer programs that I am aware of. This is the (hereby
> named, although reported on this newsgroup previously) Zehr
> Cubereduction algorithm: when rolling out with a live cube and a
> takeable cube is given, keep the cube at the same value and play two
> games from the position where the cube was turned. (Do this on all
> subsequent cube turnings too.) The purpose of this is to allow a live
> cube on rollouts and yet keep a single game with a high cube value from
> affecting the results too much.
This is clever. Doubling the number of trials in variations in which a cube
is accepted reduces the variance of our estimate in those variations by
50%.
The impact on run time is reasonable. Even though we double the double/take
trial count, we do not double the runtime, since the variation is only
split *after* the double. Moves prior to the double are common to the two
trials.
The bottom line is that the method halves the variance of double/take
trials with considerably less than a doubling of the runtime.
A few implementation details for implementationminded readers. First,
there are pathological situations where the rollout would explode. (These
situations are the same as the situations where the cubeusing equity value
is undefined. Check out the Deja News archive for related threads.) The
number of "splits" of the variation tree must be limited to prevent
explosions.
Second, you have weigh the double/take variations on an equal footing with
the double/drop variations. This means that the pair of double/take games
must be counted as a single game, not two games.

Michael J Zehr wrote:
> The basic idea of any variance reduction method is to estimate the
> amount of luck each side has and factor that out. For example, if one
> side has two on the bar against a 5point board and rolls the necessary
> doubles to come in that's a very lucky roll. A variance reduction
> algorithm would measure that luck and change the results of the game in
> some way to take account of it.
>
> In the example above, if the neural net's evaluation of rolling the
> double to enter is +.243 equity, and the evaluation of the position
> before rolling as .324, that roll was worth +.567. If the player who
> rolled the lucky double goes on to win then instead of getting 1 point
> for the win they might get 1 point less some fraction of the equity from
> their lucky rolls. So you might deduct half of .567 (as well as half of
> other lucky rolls, but also adding in half of the unlucky rolls, and
> doing the same thing for the opponents).
Whoa! This is not "variance reduction." This is "bias introduction."
(Math phobic readers may wish to avert their eyes at this juncture.)
The algorithm you just described computes the movebymove change in the
neural net evaluation over the course of a single trial. The final outcome
of the trial is offset against the sum of the changes. This reduces
variance, to be sure. But *how* does it reduce variance? Let's take a look.
Let E0 be the evaluation of the root position of the rollout. After 1 turn
the evaluation changes to E1, then to E2, and so on. In the final position
the outcome is V. What is the sum of the changes from turn to turn?
In the first turn the change is E0  E1. In the second turn the change is
E1  E2. In the third turn the change is E2  E3, and so on. The sum is
(E0  E1) + (E1  E2) + (E2  E3) + (E3  E4) + ... + (En  V)
This type of sum is called a "telescoping series," because it resembles a
collapsible telescope. To see why, let's rearrange the parentheses a
little:
E0 (E1 + E1) (E2 + E2) (E3 + E3) ... (En + En)  V
So the sum is E0  V! This amount will be offset against V. If the fraction
we offset against V is f, then the final answer is
f * E0 + (1  f) * V
Well no wonder we reduce the variance! All the algorithm accomplishes is to
compute a weighted average of the true statistical answer with our initial
estimate. This is obviously the wrong answer; our goal is to eliminate our
initial estimate from the equation. The dependency on E0 is a bias.
True variance reduction does not introduce bias into the result. I believe
I have reconstructed JF's algorithm based upon the cryptic description in
JF's user manual. Before I describe it, I want to provide some motivation
for it, so that you can understand how the algorithm reduces variance.
Let E be the true cubeless equity function, and e be our static evaluation
function (e.g. a neural net). Rollouts estimate E by playing out a number
of trial games and averaging the results. A simple rollout does not exploit
the fact that e is a very good approxmation to E. Variance reduction
reduces the number of trials we need to achieve a specific level of
accuracy by exploiting the fact that e is a very good approximation of E.
How does it work?
Instead of regarding the rollout as establishing a value for E, let's think
of it as establishing a value for E  e. In words, the rollout will
estimate the error in our static evaluation.
Let's begin by doing a 1ply search, then estimate E as the average of the
1 ply values of e. This would be completely correct if e were perfect, but
since e isn't perfect, we need to estimate the error involved. How can we
do that?
Simple. Let's just pick a 1ply position at random, and estimate the error
involved in its evaluation. Note that I have just recreated the problem
that I posed at the root (i.e., estimating the difference between e and E)
but one ply deeper into the game tree. I can continue this process until I
reach a terminal node.
To estimate E, we start at the end of the variation we have selected at
random and apply the formula
E = Delta + Average(e)
where the average is taken over all nextply nodes and Delta is the
recursively computed estimate of the inaccuracy of e for the
randomlyselected successor.
How does this procedure reduce variance? Let's take a look. The variance of
our estimate of E equals the variance of our estimate of Delta plus the
variance of our estimate of Average(e). But Average(e) is a constant, so
its variance is zero. So the variance of our estimate of E is simply the
variance of our estimate of Delta.
Now, Delta is the estimate of Ee for a single successor position. And
since e is very, very accurate, the variance of Delta is very, very low.
The standard deviation of Ee is 0.035 or so. Compare this low error rate
with the typical standard deviation of 0.5 for a single game rollout.
Reducing the standard deviation by a factor of 0.5 / 0.035 = 14.28 would
normally require 14.28 * 14.28 = 203.9 times as many trials. We need vastly
fewer trials to get a specific level of accuracy.
The factor of 203.9 is not pure profit, alas, because we have to do 1ply
searches at every node when we use variance reduction. By contrast, a
normal rollout does only static evaluation. This typically costs a factor
of 20 or so, which is a shame. But we still should be very happy with a net
increase of a factor of 10 in overall throughput.
As I stated before, I believe that this is the algorithm used in JF.
However, this belief is based upon deciphering the user's manual, which
refers only to certain theorems from the theory of Markov chains. There may
be other formulations of this algorithm, or there may be other ways to
apply the theory to reduce variance.
Warm Regards,
Brian Sheppard




Programming
 Adjusting to a weaker opponent (Brian Sheppard, July 1997)
 Anticomputer positions (Bill Taylor+, June 1998)
 BKG 9.8 vs. Villa (Raccoon+, Aug 2006)
 BKG 9.8 vs. Villa (Andreas Schneider, June 1992)
 BKG beats world champion (Marty Storer, Sept 1991)
 Backgames (David Montgomery+, June 1998)
 Blockading feature (Sam Pottle+, Feb 1999)
 Board encoding for neural network (Brian Sheppard, Feb 1997)
 Bot weaknesses (Douglas Zare, Mar 2003)
 Building and training a neuralnet player (Brian Sheppard, Aug 1998)
 How to count plies? (Chuck Bower+, Jan 2004)
 How to count plies? (tanglebear+, Mar 2003)
 Ideas for improving computer play (David Montgomery, Feb 1994)
 Ideas on computer players (Brian Sheppard, Feb 1997)
 Introduction (Gareth McCaughan, Oct 1994)
 Measuring Difficulty (John Robson+, Feb 2005)
 Methods of encoding positions (Gary Wong, Jan 2001)
 Nply algorithm (eXtreme Gammon, Jan 2011)
 Neural net questions (Brian Sheppard, Mar 1999)
 Pruning the list of moves (David Montgomery+, Feb 1994)
 Search in Trees with Chance Nodes (Thomas Hauk, Feb 2004)
 Source code (Gary Wong, Dec 1999)
 TDGammon vs. Robertie (David Escoffery, June 1992)
 Training for different gammon values (Gerry Tesauro, Feb 1996)
 Training neural nets (Walter Trice, Nov 2000)
 Variance reduction in races (David Montgomery+, Dec 1998)
 Variance reduction of rollouts (Michael J. Zehr+, Aug 1998)
 Variance reduction of rollouts (Jim Williams, June 1997)
 What is a "neural net"? (Gary Wong, Oct 1998)
 Writing a backgammon program (Gary Wong, Jan 1999)
From GammOnLine
Long message
Recommended reading
Recent addition

 
