Forum Archive :
Programming
Variance reduction in races

Assume we have a database of singlesided evaluations of
noncontact positions. This database is built up assuming that
each move is made to minimize the expected number of rolls left.
Although the database is highly accurate for predicting cubeless
win probability, we still may want to roll out a position from the
database, either to get a more accurate evaluation, or because
we are doing a cubeful rollout.
We can use the database to get tremendous variance reduction
in our rollout.
As with lookahead variance reduction, the idea is to accumulate
a "luck adjustment" over the course of a game. For each turn
the luck adjustment should have a mean of zero and correlate
well with how lucky the roll was.
We can easily estimate how lucky a roll was by simply using the
database to compare the cpw of the position before the roll with
the cpw after the roll. However, these luck ratings might not
have mean zero if sometimes we make moves that do not minimize
the expected rolls to bear off. Because of the way a singlesided
database is built up, we can ensure a mean luck adjustment of zero
by calculating the postroll cpw assuming that we had made the
mean roll minimizing play.
Below is pseudocode.
turn = 0
old_cpw = cpw( position we're rolling out )
cumulativeLuck = 0.0
while ( game not over ) {
get the dice roll
get the best move
new_cpw = cpw of the *mean roll minimizing move*
luck = new_cpw  old_cpw;
if ( turn == 1 ) cumulativeLuck = cumulativeLuck  luck
else cumulativeLuck = cumulativeLuck + luck
switch sides
old_cpw = 1.0  cpw( move that was made )
if ( turn == 1 ) turn = 0
else turn = 1
}
if ( turn == 1 ) final_cpw = 1.0
else final_cpw = 0.0
final_cpw = final_cpw  cumulativeLuck
This variance reduction method is extremely powerful. Because
the database evaluations are already so accurate, the luck
adjustments are very good  much better than the lookahead luck
adjustments. In the handful of positions I have checked, a few
hundred games generally gives the exact cpw to four or five decimal
places (based on Sconyer's CD).
Furthermore, this method is very fast. The cpw of the mean roll
minimizing move can be identified while determining the best move
with virtually no extra work. Unlike in lookahead reduction, we
only evaluate continuations based on one roll of the dice.
David Montgomery
monty@cs.umd.edu
monty on FIBS


Brian Sheppard writes:
Instead of rolling out a race, just do a twoply lookahead. That is much
faster than a rollout, and the answer is just as good.
The variancereduction method you give is wrong. See later comments.
The best rollout technique is to statically evaluate racing positions as
soon as contact is broken. This way there is no statistical error from
rolling out the rest of the game, and you can get an evaluation long before
the game reaches a database position.
Race evaluation functions are exceptionally accurate. For example, my race
estimation function has an average error that corresponds to misestimating
a race by 1/10 of a pip. For a moderatelength race where each side has a
50% chance of winning there may be an error of 0.004; certainly this is
good enough for any practical purpose. JF is not quite as good as this, but
it isn't necessary to be that good; I have gone overboard on this
particular feature.
(BTW: One neat feature of evaluating immediately upon breaking contact is
that the race at that point must be either long or lopsided. Racing
evaluators are highly accurate in such cases. The average error from
evaluating such cases may be under 0.001.)
And it is much faster to statically evaluate. This is a big advantage
because you can complete more iterations that way.
My racing evaluator is cubeless, but it could be extended to cubeusing
with equally good accuracy. It can even be extended to matchplay with a
little more effort, but still with equally good accuracy.
> turn = 0
> old_cpw = cpw( position we're rolling out )
> cumulativeLuck = 0.0
> while ( game not over ) {
> get the dice roll
> get the best move
> new_cpw = cpw of the *mean roll minimizing move*
> luck = new_cpw  old_cpw;
> if ( turn == 1 ) cumulativeLuck = cumulativeLuck  luck
> else cumulativeLuck = cumulativeLuck + luck
> switch sides
> old_cpw = 1.0  cpw( move that was made )
> if ( turn == 1 ) turn = 0
> else turn = 1
> }
> if ( turn == 1 ) final_cpw = 1.0
> else final_cpw = 0.0
> final_cpw = final_cpw  cumulativeLuck
The "cumulativeLuck" variable is a "telescoping series" that collapses.
Isn't this just an expensive way to compute the initial value of the
series, which is the static evaluation at the start of the run?
On each turn you compute the change in evaluation, which you sum in the
CumulativeLuck variable. At each stage, therefore, new_cpw = original_cpw +
CumulativeLuck. At the end of the game, final_cpw is set to be the winner,
but that also happens to be the last value of new_cpw as well. So when you
subtract CumulativeLuck you are left with original_cpw.
Variance reduction has been incorrectly described previously in this
newsgroup. I described a correct variance reduction algorithm for this
newsgroup. To find that article, go to dejanews.com, and search
rec.games.backgammon for articles containing the words "variance reduction"
written by bsheppard@hasbro.com.
Brian


David Montgomery writes:
Brian Sheppard writes:
> Instead of rolling out a race, just do a twoply lookahead.
> That is much faster than a rollout, and the answer is just as good.
It is faster than a rollout, but it may or may not be just as good.
It depends on the quality of the basic evaluator that you use in
your lookahead evaluation. And it depends on what you consider
"just as good."
I can't comment on the quality of your race evaluator. I'm sure its
very good. However, my article was about evaluating races with
singlesided databases. This approach is in wide use  I can name
a dozen people who have created singlesided databases, including
the creators of Jellyfish and Snowie. It was because of the
widespread use of this technique that I posted my idea.
Position 1.
+242322212019++181716151413+
 O O O O O   
 O O   
 O    X on roll, cpw?
   
 X X X X   
 X X X X X   
+123456++789101112+
If you use a singlesided database to evaluate Position 1 you get
80.6%. If you look ahead two ply (one move for each side), you get
80.1%. The true percentage is 79.6%, and you get this figure if
you do a short rollout using the method I presented. I don't
consider a half percent error to be as good as an error of less
than .01%, but you may.
Position 2.
+242322212019++181716151413+
 O O   X X 
 O O   
 O O    X on roll, cpw?
 O O   
   
 X X   
+123456++789101112+
In Position 2, if you use a singlesided database you get 14.1%.
The same if you look ahead 2 ply. The real cpw is 14.7%. I
don't consider a .6% error as good as no error, but you may.
Brian Sheppard writes:
> The best rollout technique is to statically evaluate racing
> positions as soon as contact is broken. This way there is no
> statistical error from rolling out the rest of the game, and
> you can get an evaluation long before the game reaches a database
> position.
Assuming you are rolling out a position with substantial contact
and you have a good noncontact evaluator, I agree with you. However,
there are other reasons for evaluating and rolling out racing positions.
 You may have an interest in a noncontact position by itself.
Perhaps you want to check whether your take of that 16 cube last
night was correct. Although your noncontact evaluations
may be accurate, the evaluations of Jellyfish and Snowie are
often off by several percent, so I would want to check them with
a rollout.
 You might want to verify that your noncontact evaluator is as
good as you think it is. Rolling positions out is a great way
to do this.
 You might actually be creating a new noncontact evaluator that
isn't based on database lookup, and need training data. Again,
rolling out positions is a good approach.
 If you use the cubeful rollout method of Jellyfish and Snowie, you
have to do rollouts rather than truncate upon reaching noncontact.
There are certainly other algorithms for live cube rollouts, and I
am not endorsing theirs, but if you want to use that algorithm, you
have to keep rolling.
Brian Sheppard writes:
> Race evaluation functions are exceptionally accurate. [...]
> My static analyzer is accurate to two or three decimal places,
> and the extra accuracy [of a rollout] does not matter compared to
> the expense of rolling out a few hundred games.
I'm happy for you that your evaluator is so good. Jellyfish and Snowie
can be off in the second digit. I've seen evaluations that are
off by 7%, and rollouts that are off by 3.5%.
It depends on why we are evaluating a position. If the only reason
we are looking at a position is because it came up in a rollout,
then it is probably better to just evaluate it and get on to the next
game. But if we care about the position itself, the expense of rolling
out a few hundred games (which takes about a second on my slow Pentium
100) is well worth it, especially if by doing so we get an answer that
is essentially exact.
There is another issue apart from accuracy per se. Bias.
Singlesided database evaluators are pretty darn accurate, but they
are biased. They systematically underrate certain kinds of positions.
One reason I am rolling out positions for which I have database
evaluations is so I can eliminate this bias.
Brian Sheppard writes:
> My racing evaluator is cubeless, but it could be extended to cubeusing
> with equally good accuracy. It can even be extended to matchplay with
> a little more effort, but still with equally good accuracy.
I don't know of any way that singlesided noncontact database evaluators
can be extended to cubeusing or matchplay situations. You can build
a twosided database that is exact for the cube, but these are huge.
Hugh Sconyers sells these databases, and on three CDs you still have
only about a third of the positions where both sides are all home. You
can write cubehandling and matchplay code on top of a singlesided
evaluator, just as JF and Snowie add their cube handling on top of cubeless
evaluations, but this is a completely distinct system that has nothing to
do with the singlesided database.
Brian Sheppard writes:
> The variancereduction method you give is wrong.
Uh, no, you're wrong. The method is right. It's implemented and heavily
tested. It works great, making each game worth about 2000.
> The "cumulativeLuck" variable is a "telescoping series" that collapses.
No, it's not a telescoping series. If you look carefully, you will
see that I put asterisks in the important line
> > new_cpw = cpw of the *mean roll minimizing move*
This may or may not equal cpw( move that was made ), depending on
whether the move that was made was the mean roll minimizing move.
Often the best move is not the mean roll minimizing move. The most
wellknown example is catering to doubles when you have only one
possible shake left, but there are many others.
As you wrote, if it were a telescoping series, I would get the answer
I started with. But I don't get this. Instead I get the same cpws
as in Sconyers' CD, which are often half a percent or more different
from the singlesided database cpws.
Brian Sheppard writes:
> Variance reduction has been incorrectly described previously in this
> newsgroup. I described a correct variance reduction algorithm for this
> newsgroup. To find that article, go to dejanews.com, and search
> rec.games.backgammon for articles containing the words "variance
> reduction" written by bsheppard@hasbro.com.
Thanks for the tip, Brian, but I figured out lookahead variance reduction
about fifteen minutes after I read about it in the Jellyfish 2 manual.
But just for the heck of it, let's take a look at what you wrote on
variance reduction. The quoted text below is taken from Tom Keith's
newsgroup archive, Programming section. You were responding to Michael
J. Zehr, who wrote:
> The basic idea of any variance reduction method is to estimate the
> amount of luck each side has and factor that out. For example,
> if one side has two on the bar against a 5point board and rolls the
> necessary doubles to come in that's a very lucky roll. A variance
> reduction algorithm would measure that luck and change the results of
> the game in some way to take account of it.
>
> In the example above, if the neural net's evaluation of rolling the
> double to enter is +.243 equity, and the evaluation of the position
> before rolling as .324, that roll was worth +.567. If the player who
> rolled the lucky double goes on to win then instead of getting 1 point
> for the win they might get 1 point less some fraction of the equity
> from their lucky rolls.
This is correct as long as the .324 was the lookahead evaluation.
Michael here wasn't spelling out the algorithm  he was giving "[t]he
basic idea." Michael may have missed this point, but it's more likely
in my opinion that he assumed that readers sophisticated enough to
understand what he was talking about would be sophisticated enough
to supply this detail. After all, the variance reduction algorithm
had already been described in the newsgroup in more detail by Jim
Williams, over a year earlier. (This article is also in the Backgammon
Galore archive.)
Brian Sheppard wrote:
> Whoa! This is not "variance reduction." This is "bias introduction."
No, it's not. It's the same algorithm as you presented, as I will show
below.
Brian Sheppard wrote:
> Let E0 be the evaluation of the root position of the rollout. After 1
> turn the evaluation changes to E1, then to E2, and so on. In the final
> position the outcome is V. What is the sum of the changes from turn to
> turn?
>
> In the first turn the change is E0  E1. In the second turn the change
> is E1  E2. In the third turn the change is E2  E3, and so on. The
> sum is
>
> (E0  E1) + (E1  E2) + (E2  E3) + (E3  E4) + ... + (En  V)
>
> This type of sum is called a "telescoping series," because it
> resembles a collapsible telescope.
Let's call the lookahead estimate of a position N laEN. Then, if you
correctly use lookahead, the sum is
(laE0  E1) + (laE1  E2) + (laE2  E3) + (laE3  E4) + ... + (laEn  V)
which is not a telescoping series.
Brian Sheppard wrote:
> Let E be the true cubeless equity function, and e be our static
> evaluation function (e.g. a neural net).
>
> Instead of regarding the rollout as establishing a value for E, let's
> think of it as establishing a value for E  e.
>
> Let's begin by doing a 1ply search, then estimate E as the average
> of the 1ply values of e.
Okay. We want delta = E0  e0. To start you estimate delta as lae0  e0.
> This would be completely correct if e were perfect, but since e isn't
> perfect, we need to estimate the error involved. [...] Let's just pick
> a 1ply position at random, and estimate the error involved in its
> evaluation. [...] I can continue this process until I reach a
> terminal node.
Okay. We started with delta ~= lae0  e0. To correct for the error in
lae0, we first select a next move. The evaluation of this move was
part of lae0. We estimate the error in lae0 using an estimate of
the error in the evaluation of the continuation: lae1  e1. This
correction is added to our estimate of delta. We continue with these
corrections during the rest of the rollout, ending up with
delta = (lae0  e0) + (lae1  e1) + ... + (laen1  en1) + (V  en)
When we are done, since delta is an estimate of E0e0, we estimate
E0 as e0 + delta. So we have
E0 ~= e0 + (lae0  e0) + (lae1  e1) + ... + (laen1  en1) + (V  en)
Canceling the e0 and rearranging terms we have
E0 ~= (lae0  e1) + (lae1  e2) + ... + (laen1  en) + V
This is exactly the same thing as the algorithm described by Jim Williams
and alluded to by Michael Zehr. It's the same thing as my race algorithm
except now we don't have to do lookahead, and each term in the summation
is much more accurate because the database evaluations are so good.
David Montgomery
monty@cs.umd.edu
monty on FIBS


David Montgomery writes:
A couple people have told me that it isn't immediately clear that
the expected luck adjustment at each ply is zero. To understand
this you have to understand how singlesided racing databases work.
Here is a detailed explanation.
For each position in the database you will calculate the probability
that the position will be borne off in 0, 1, 2, etc. rolls. The
database is built up by beginning with the simplest position,
which is everybody borne off. Obviously, for everybody borne off you
are completely borne off 100% of the time in 0 rolls.
Then you move to the next most complex position  a single checker
on the one point. You look ahead and get the best move for each roll,
and determine that after all of the 36 rolls, in your next position,
you will be off 100% of the time in 0 rolls. Those positions are one
roll later, so here you are off 100% of the time in 1 roll.
A little later you get to the position with a single checker on the
seven point, let's say. Again, for each roll you find the best move
and get the probabilities of being off in each number of rolls. Here
you find
Number of Off Off Off
Successors in 0 in 1 in 2 Rolls
2 0.000 0.944 0.056 21
11 0.000 1.000 0.000 11 31 41 51 32 42
23 1.000 0.000 0.000 everything else
Combining these and adding in the roll for looking ahead you can
determine the probability distribution for one checker on the 7 point:
Off Off Off
in 1 in 2 in 3
0.639 0.358 0.003
Since the database in singlesided, you are never concerned with
the opponent's checkers. Instead, the best move is just considered
to be the one that will get off in the fewest rolls, on average.
You can easily calculate the average number of rolls that it will
take to bear off using the probability distribution. For example,
with a single checker on the seven point, on average you bear off
in .639x1 + .358x2 + .003x3 = 1.364 rolls.
By proceeding through the positions in an orderly fashion (there is
more than one way to do it), you ensure that whenever you want to
calculate the database entry for a position P, you have already
calculated the entries for all the successors to P. Thus you never
have to look ahead more than 1 ply.
Once you have constructed a database like this, you can use it to
get (usually) very accurate estimates of the cubeless probability
of winning (cpw) in positions with checkers for both sides.
Here is an example. X has a single checker on the 7 point. His
distribution, as calculated above, is:
Off Off Off
in 1 in 2 in 3
0.639 0.358 0.003
His opponent O has a single checker on the 4 point. Her distribution is:
Off Off
in 1 in 2
0.944 0.056
If X is on roll he will win whenever he is off in 1 roll (.639), plus
whenever he is off in 2 rolls and O is off in 2 or more (.358x.056)
for a total of .659 (65.9%).
Although these probabilities are usually very accurate, they are not
necessarily exact. This is because the probability distributions
assume that a player is always making the move that will bear off the
fastest on average. Sometimes a different move is best when the
opposing position is considered.
Now let's look at the luck adjustments. First, the equationfree
presentation. The cpws that you get using the database are all
calculated assuming that both sides play the rest of the game
moving to minimize the rolls left to bear off. Let's call these
database estimates of cpws dcpws. If both sides move this way,
every dcpw is exact because the probability distributions are
exact.
Let's assume that both sides play this way. Now note that every dcpw
must necessarily be the expectaton of the dcpws for successor positions.
Since the dcpws are exact, it can't be any other way. If this isn't
clear to you, you may want to look at the equations further on.
Since the dcpw for a position is the expectation of the dcpws for
successor positions, the expected difference between the dcpw and
the dcpws for successors must be zero. This is exactly the property
that the variance reduction algorithm uses.
The expected difference between the dcpw for the current position
and the dcpw for the position that would follow it if you move to
minimize mean bearoff rolls must be zero. Thus the expected luck
adjustment at each ply is zero. Thus the expected overall luck
adjustment for each game is zero. Thus the method introduces no bias.
For those not satisfied with that, here are some equations.
Let n[j] be the probability that the nonroller will be off in j rolls.
Let r[i] be the probability that the roller will be off in i rolls.
Let m[d][i] be the probability that the roller will be off in i more rolls
after playing roll d in the mean roll minimizing manner.
That is,
36

r[i+1] = (1/36) \ m[d][i] (1)
/

d=1
Let M be the maximum possible rolls that can be taken by either side.
There are many ways of expressing the roller's chances. Let's start
with this one:
M j
 
roller's dcpw = \ { n[j] * \ r[i] } (2)
/ /
 
j=1 i=1
This says that the roller will win whenever the nonroller takes
j rolls and the roller takes anywhere from 1 to j rolls.
Okay, now let's look ahead, after the roller plays roll d, and
express the nonroller's chances following the form of equation (2):
M i
 
nonroller's dcpw = \ { m[d][i] * \ n[j] } (3)
after roll d played / /
 
i=1 j=1
This says that whenever the roller still has i rolls left after playing
roll d (m[d][i]), the nonroller will win if she can take i or fewer rolls.
The roller's chances in the original position are just 1 minus the
average of (3) taken over all the dice rolls:
rollers dcpw =
36 M i
  
1  (1/36) \ [ \ { m[d][i] * \ n[j] } ]
/ / /
  
d=1 i=1 j=1
We can change the order of summation:
M 36 i
  
1  \ [ { (1/36) \ m[d][i] } * \ n[j] ]
/ / /
  
i=1 d=1 j=1
This can be simplified using equation (1) to get:
M i
 
1  \ { r[i+1] * \ n[j] }
/ /
 
i=1 j=1
This says: if the roller takes i+1 rolls, the nonroller will win by
taking i rolls or fewer  the roller will therefore win 1 minus that.
Clearly this is the same quantity as equation (2), showing that the
expected dcpw you get looking ahead is the same as the dcpw you
calculate directly.
David Montgomery
monty on FIBS
monty@cs.umd.edu




Programming
 Adjusting to a weaker opponent (Brian Sheppard, July 1997)
 Anticomputer positions (Bill Taylor+, June 1998)
 BKG 9.8 vs. Villa (Raccoon+, Aug 2006)
 BKG 9.8 vs. Villa (Andreas Schneider, June 1992)
 BKG beats world champion (Marty Storer, Sept 1991)
 Backgames (David Montgomery+, June 1998)
 Blockading feature (Sam Pottle+, Feb 1999)
 Board encoding for neural network (Brian Sheppard, Feb 1997)
 Bot weaknesses (Douglas Zare, Mar 2003)
 Building and training a neuralnet player (Brian Sheppard, Aug 1998)
 How to count plies? (Chuck Bower+, Jan 2004)
 How to count plies? (tanglebear+, Mar 2003)
 Ideas for improving computer play (David Montgomery, Feb 1994)
 Ideas on computer players (Brian Sheppard, Feb 1997)
 Introduction (Gareth McCaughan, Oct 1994)
 Measuring Difficulty (John Robson+, Feb 2005)
 Methods of encoding positions (Gary Wong, Jan 2001)
 Nply algorithm (eXtreme Gammon, Jan 2011)
 Neural net questions (Brian Sheppard, Mar 1999)
 Pruning the list of moves (David Montgomery+, Feb 1994)
 Search in Trees with Chance Nodes (Thomas Hauk, Feb 2004)
 Source code (Gary Wong, Dec 1999)
 TDGammon vs. Robertie (David Escoffery, June 1992)
 Training for different gammon values (Gerry Tesauro, Feb 1996)
 Training neural nets (Walter Trice, Nov 2000)
 Variance reduction in races (David Montgomery+, Dec 1998)
 Variance reduction of rollouts (Michael J. Zehr+, Aug 1998)
 Variance reduction of rollouts (Jim Williams, June 1997)
 What is a "neural net"? (Gary Wong, Oct 1998)
 Writing a backgammon program (Gary Wong, Jan 1999)
From GammOnLine
Long message
Recommended reading
Recent addition

 
