Programming

Forum Archive : Programming

 
Variance reduction in races

From:   David Montgomery
Address:   monty@cs.umd.edu
Date:   1 December 1998
Subject:   Variance Reduction in Races
Forum:   rec.games.backgammon
Google:   741fjo$42d@krackle.cs.umd.edu

Assume we have a database of single-sided evaluations of
non-contact positions.  This database is built up assuming that
each move is made to minimize the expected number of rolls left.

Although the database is highly accurate for predicting cubeless
win probability, we still may want to roll out a position from the
database, either to get a more accurate evaluation, or because
we are doing a cubeful rollout.

We can use the database to get tremendous variance reduction
in our rollout.

As with lookahead variance reduction, the idea is to accumulate
a "luck adjustment" over the course of a game.  For each turn
the luck adjustment should have a mean of zero and correlate
well with how lucky the roll was.

We can easily estimate how lucky a roll was by simply using the
database to compare the cpw of the position before the roll with
the cpw after the roll.  However, these luck ratings might not
have mean zero if sometimes we make moves that do not minimize
the expected rolls to bear off.  Because of the way a single-sided
database is built up, we can ensure a mean luck adjustment of zero
by calculating the post-roll cpw assuming that we had made the
mean roll minimizing play.

Below is pseudo-code.

    turn = 0
    old_cpw = cpw( position we're rolling out )
    cumulativeLuck = 0.0

    while ( game not over ) {

        get the dice roll
        get the best move
        new_cpw = cpw of the *mean roll minimizing move*

        luck = new_cpw - old_cpw;
        if ( turn == 1 ) cumulativeLuck = cumulativeLuck - luck
        else             cumulativeLuck = cumulativeLuck + luck

        switch sides
        old_cpw = 1.0 - cpw( move that was made )
        if ( turn == 1 ) turn = 0
        else             turn = 1
    }

    if ( turn == 1 ) final_cpw = 1.0
    else             final_cpw = 0.0
    final_cpw = final_cpw - cumulativeLuck

This variance reduction method is extremely powerful.  Because
the database evaluations are already so accurate, the luck
adjustments are very good -- much better than the lookahead luck
adjustments.  In the handful of positions I have checked, a few
hundred games generally gives the exact cpw to four or five decimal
places (based on Sconyer's CD).

Furthermore, this method is very fast.  The cpw of the mean roll
minimizing move can be identified while determining the best move
with virtually no extra work.  Unlike in lookahead reduction, we
only evaluate continuations based on one roll of the dice.

David Montgomery
monty@cs.umd.edu
monty on FIBS

Brian Sheppard  writes:

Instead of rolling out a race, just do a two-ply lookahead. That is much
faster than a rollout, and the answer is just as good.

The variance-reduction method you give is wrong. See later comments.

The best rollout technique is to statically evaluate racing positions as
soon as contact is broken. This way there is no statistical error from
rolling out the rest of the game, and you can get an evaluation long before
the game reaches a database position.

Race evaluation functions are exceptionally accurate. For example, my race-
estimation function has an average error that corresponds to misestimating
a race by 1/10 of a pip. For a moderate-length race where each side has a
50% chance of winning there may be an error of 0.004; certainly this is
good enough for any practical purpose. JF is not quite as good as this, but
it isn't necessary to be that good; I have gone overboard on this
particular feature.

(BTW: One neat feature of evaluating immediately upon breaking contact is
that the race at that point must be either long or lopsided. Racing
evaluators are highly accurate in such cases. The average error from
evaluating such cases may be under 0.001.)

And it is much faster to statically evaluate. This is a big advantage
because you can complete more iterations that way.

My racing evaluator is cubeless, but it could be extended to cube-using
with equally good accuracy. It can even be extended to match-play with a
little more effort, but still with equally good accuracy.

>     turn = 0
>     old_cpw = cpw( position we're rolling out )
>     cumulativeLuck = 0.0
>     while ( game not over ) {
>         get the dice roll
>         get the best move
>         new_cpw = cpw of the *mean roll minimizing move*
>         luck = new_cpw - old_cpw;
>         if ( turn == 1 ) cumulativeLuck = cumulativeLuck - luck
>         else             cumulativeLuck = cumulativeLuck + luck
>         switch sides
>         old_cpw = 1.0 - cpw( move that was made )
>         if ( turn == 1 ) turn = 0
>         else             turn = 1
>     }
>     if ( turn == 1 ) final_cpw = 1.0
>     else             final_cpw = 0.0
>     final_cpw = final_cpw - cumulativeLuck

The "cumulativeLuck" variable is a "telescoping series" that collapses.
Isn't this just an expensive way to compute the initial value of the
series, which is the static evaluation at the start of the run?

On each turn you compute the change in evaluation, which you sum in the
CumulativeLuck variable. At each stage, therefore, new_cpw = original_cpw +
CumulativeLuck. At the end of the game, final_cpw is set to be the winner,
but that also happens to be the last value of new_cpw as well. So when you
subtract CumulativeLuck you are left with original_cpw.

Variance reduction has been incorrectly described previously in this
newsgroup. I described a correct variance reduction algorithm for this
newsgroup. To find that article, go to dejanews.com, and search
rec.games.backgammon for articles containing the words "variance reduction"
written by bsheppard@hasbro.com.

Brian

David Montgomery  writes:

Brian Sheppard writes:
> Instead of rolling out a race, just do a two-ply lookahead.
> That is much faster than a rollout, and the answer is just as good.

It is faster than a rollout, but it may or may not be just as good.
It depends on the quality of the basic evaluator that you use in
your lookahead evaluation.  And it depends on what you consider
"just as good."

I can't comment on the quality of your race evaluator.  I'm sure its
very good.  However, my article was about evaluating races with
single-sided databases.  This approach is in wide use -- I can name
a dozen people who have created single-sided databases, including
the creators of Jellyfish and Snowie.  It was because of the
widespread use of this technique that I posted my idea.

Position 1.
 +24-23-22-21-20-19-+---+18-17-16-15-14-13-+
 | O     O  O  O  O |   |                  |
 |          O     O |   |                  |
 |          O       |   |                  |  X on roll, cpw?
 |                  |   |                  |
 | X  X  X  X       |   |                  |
 | X  X  X  X  X    |   |                  |
 +-1--2--3--4--5--6-+---+-7--8--9-10-11-12-+

If you use a single-sided database to evaluate Position 1 you get
80.6%.  If you look ahead two ply (one move for each side), you get
80.1%.  The true percentage is 79.6%, and you get this figure if
you do a short rollout using the method I presented.  I don't
consider a half percent error to be as good as an error of less
than .01%, but you may.

Position 2.
 +24-23-22-21-20-19-+---+18-17-16-15-14-13-+
 | O  O             |   |         X     X  |
 | O  O             |   |                  |
 | O  O             |   |                  |  X on roll, cpw?
 | O  O             |   |                  |
 |                  |   |                  |
 |          X     X |   |                  |
 +-1--2--3--4--5--6-+---+-7--8--9-10-11-12-+

In Position 2, if you use a single-sided database you get 14.1%.
The same if you look ahead 2 ply.  The real cpw is 14.7%.  I
don't consider a .6% error as good as no error, but you may.

Brian Sheppard writes:
> The best rollout technique is to statically evaluate racing
> positions as soon as contact is broken. This way there is no
> statistical error from rolling out the rest of the game, and
> you can get an evaluation long before the game reaches a database
> position.

Assuming you are rolling out a position with substantial contact
and you have a good non-contact evaluator, I agree with you.  However,
there are other reasons for evaluating and rolling out racing positions.

- You may have an interest in a non-contact position by itself.
  Perhaps you want to check whether your take of that 16 cube last
  night was correct.  Although your non-contact evaluations
  may be accurate, the evaluations of Jellyfish and Snowie are
  often off by several percent, so I would want to check them with
  a rollout.
- You might want to verify that your non-contact evaluator is as
  good as you think it is.  Rolling positions out is a great way
  to do this.
- You might actually be creating a new non-contact evaluator that
  isn't based on database lookup, and need training data.  Again,
  rolling out positions is a good approach.
- If you use the cubeful rollout method of Jellyfish and Snowie, you
  have to do rollouts rather than truncate upon reaching non-contact.
  There are certainly other algorithms for live cube rollouts, and I
  am not endorsing theirs, but if you want to use that algorithm, you
  have to keep rolling.

Brian Sheppard writes:
> Race evaluation functions are exceptionally accurate. [...]
> My static analyzer is accurate to two or three decimal places,
> and the extra accuracy [of a rollout] does not matter compared to
> the expense of rolling out a few hundred games.

I'm happy for you that your evaluator is so good.  Jellyfish and Snowie
can be off in the second digit.  I've seen evaluations that are
off by 7%, and rollouts that are off by 3.5%.

It depends on why we are evaluating a position.  If the only reason
we are looking at a position is because it came up in a rollout,
then it is probably better to just evaluate it and get on to the next
game.  But if we care about the position itself, the expense of rolling
out a few hundred games (which takes about a second on my slow Pentium
100) is well worth it, especially if by doing so we get an answer that
is essentially exact.

There is another issue apart from accuracy per se.  Bias.
Single-sided database evaluators are pretty darn accurate, but they
are biased.  They systematically underrate certain kinds of positions.
One reason I am rolling out positions for which I have database
evaluations is so I can eliminate this bias.

Brian Sheppard writes:
> My racing evaluator is cubeless, but it could be extended to cube-using
> with equally good accuracy. It can even be extended to match-play with
> a little more effort, but still with equally good accuracy.

I don't know of any way that single-sided non-contact database evaluators
can be extended to cube-using or match-play situations.  You can build
a two-sided database that is exact for the cube, but these are huge.
Hugh Sconyers sells these databases, and on three CDs you still have
only about a third of the positions where both sides are all home.  You
can write cube-handling and match-play code on top of a single-sided
evaluator, just as JF and Snowie add their cube handling on top of cubeless
evaluations, but this is a completely distinct system that has nothing to
do with the single-sided database.

Brian Sheppard writes:
> The variance-reduction method you give is wrong.

Uh, no, you're wrong.  The method is right.  It's implemented and heavily
tested.  It works great, making each game worth about 2000.

> The "cumulativeLuck" variable is a "telescoping series" that collapses.

No, it's not a telescoping series.  If you look carefully, you will
see that I put asterisks in the important line

> >         new_cpw = cpw of the *mean roll minimizing move*

This may or may not equal cpw( move that was made ), depending on
whether the move that was made was the mean roll minimizing move.
Often the best move is not the mean roll minimizing move.  The most
well-known example is catering to doubles when you have only one
possible shake left, but there are many others.

As you wrote, if it were a telescoping series, I would get the answer
I started with.  But I don't get this.  Instead I get the same cpws
as in Sconyers' CD, which are often half a percent or more different
from the single-sided database cpws.

Brian Sheppard writes:
> Variance reduction has been incorrectly described previously in this
> newsgroup. I described a correct variance reduction algorithm for this
> newsgroup. To find that article, go to dejanews.com, and search
> rec.games.backgammon for articles containing the words "variance
> reduction" written by bsheppard@hasbro.com.

Thanks for the tip, Brian, but I figured out lookahead variance reduction
about fifteen minutes after I read about it in the Jellyfish 2 manual.

But just for the heck of it, let's take a look at what you wrote on
variance reduction.  The quoted text below is taken from Tom Keith's
newsgroup archive, Programming section.  You were responding to Michael
J. Zehr, who wrote:

> The basic idea of any variance reduction method is to estimate the
> amount of luck each side has and factor that out.  For example,
> if one side has two on the bar against a 5-point board and rolls the
> necessary doubles to come in that's a very lucky roll.  A variance
> reduction algorithm would measure that luck and change the results of
> the game in some way to take account of it.
>
> In the example above, if the neural net's evaluation of rolling the
> double to enter is +.243 equity, and the evaluation of the position
> before rolling as -.324, that roll was worth +.567.  If the player who
> rolled the lucky double goes on to win then instead of getting 1 point
> for the win they might get 1 point less some fraction of the equity
> from their lucky rolls.

This is correct as long as the -.324 was the lookahead evaluation.
Michael here  wasn't spelling out the algorithm -- he was giving "[t]he
basic idea."  Michael may have missed this point, but it's more likely
in my opinion that he assumed that readers sophisticated enough to
understand what he was talking about would be sophisticated enough
to supply this detail.  After all, the variance reduction algorithm
had already been described in the newsgroup in more detail by Jim
Williams, over a year earlier.  (This article is also in the Backgammon
Galore archive.)

Brian Sheppard wrote:
> Whoa! This is not "variance reduction." This is "bias introduction."

No, it's not.  It's the same algorithm as you presented, as I will show
below.

Brian Sheppard wrote:
> Let E0 be the evaluation of the root position of the rollout. After 1
> turn the evaluation changes to E1, then to E2, and so on. In the final
> position the outcome is V. What is the sum of the changes from turn to
> turn?
>
> In the first turn the change is E0 - E1. In the second turn the change
> is E1 - E2. In the third turn the change is E2 - E3, and so on. The
> sum is
>
> (E0 - E1) + (E1 - E2) + (E2 - E3) + (E3 - E4) + ... + (En - V)
>
> This type of sum is called a "telescoping series," because it
> resembles a collapsible telescope.

Let's call the lookahead estimate of a position N laEN.  Then, if you
correctly use lookahead, the sum is

(laE0 - E1) + (laE1 - E2) + (laE2 - E3) + (laE3 - E4) + ... + (laEn - V)

which is not a telescoping series.

Brian Sheppard wrote:
> Let E be the true cubeless equity function, and e be our static
> evaluation function (e.g. a neural net).
>
> Instead of regarding the rollout as establishing a value for E, let's
> think of it as establishing a value for E - e.
>
> Let's begin by doing a 1-ply search, then estimate E as the average
> of the 1-ply values of e.

Okay.  We want delta = E0 - e0.  To start you estimate delta as lae0 - e0.

> This would be completely correct if e were perfect, but since e isn't
> perfect, we need to estimate the error involved.   [...] Let's just pick
> a 1-ply position at random, and estimate the error involved in its
> evaluation. [...]  I can continue this process until I reach a
> terminal node.

Okay.  We started with delta ~= lae0 - e0.  To correct for the error in
lae0, we first select a next move.  The evaluation of this move was
part of lae0.  We estimate the error in lae0 using an estimate of
the error in the evaluation of the continuation: lae1 - e1.  This
correction is added to our estimate of delta.  We continue with these
corrections during the rest of the rollout, ending up with

delta = (lae0 - e0) + (lae1 - e1) + ... + (laen-1 - en-1) + (V - en)

When we are done, since delta is an estimate of E0-e0, we estimate
E0 as e0 + delta.  So we have

E0 ~=  e0 + (lae0 - e0) + (lae1 - e1) + ... + (laen-1 - en-1) + (V - en)

Canceling the e0 and rearranging terms we have

E0 ~=  (lae0 - e1) + (lae1 - e2) + ... + (laen-1 - en) + V

This is exactly the same thing as the algorithm described by Jim Williams
and alluded to by Michael Zehr.  It's the same thing as my race algorithm
except now we don't have to do lookahead, and each term in the summation
is much more accurate because the database evaluations are so good.

David Montgomery
monty@cs.umd.edu
monty on FIBS

David Montgomery  writes:

A couple people have told me that it isn't immediately clear that
the expected luck adjustment at each ply is zero.  To understand
this you have to understand how single-sided racing databases work.
Here is a detailed explanation.

For each position in the database you will calculate the probability
that the position will be borne off in 0, 1, 2, etc. rolls.  The
database is built up by beginning with the simplest position,
which is everybody borne off.  Obviously, for everybody borne off you
are completely borne off 100% of the time in 0 rolls.

Then you move to the next most complex position -- a single checker
on the one point.  You look ahead and get the best move for each roll,
and determine that after all of the 36 rolls, in your next position,
you will be off 100% of the time in 0 rolls.  Those positions are one
roll later, so here you are off 100% of the time in 1 roll.

A little later you get to the position with a single checker on the
seven point, let's say.  Again, for each roll you find the best move
and get the probabilities of being off in each number of rolls.  Here
you find

Number of    Off    Off    Off
Successors  in 0   in 1   in 2     Rolls
    2      0.000  0.944  0.056     21
   11      0.000  1.000  0.000     11 31 41 51 32 42
   23      1.000  0.000  0.000     everything else

Combining these and adding in the roll for looking ahead you can
determine the probability distribution for one checker on the 7 point:

  Off    Off    Off
 in 1   in 2   in 3
0.639  0.358  0.003

Since the database in single-sided, you are never concerned with
the opponent's checkers.  Instead, the best move is just considered
to be the one that will get off in the fewest rolls, on average.
You can easily calculate the average number of rolls that it will
take to bear off using the probability distribution.  For example,
with a single checker on the seven point, on average you bear off
in .639x1 + .358x2 + .003x3 = 1.364 rolls.

By proceeding through the positions in an orderly fashion (there is
more than one way to do it), you ensure that whenever you want to
calculate the database entry for a position P, you have already
calculated the entries for all the successors to P.  Thus you never
have to look ahead more than 1 ply.

Once you have constructed a database like this, you can use it to
get (usually) very accurate estimates of the cubeless probability
of winning (cpw) in positions with checkers for both sides.

Here is an example.  X has a single checker on the 7 point.  His
distribution, as calculated above, is:

  Off    Off    Off
 in 1   in 2   in 3
0.639  0.358  0.003

His opponent O has a single checker on the 4 point.  Her distribution is:

  Off    Off
 in 1   in 2
0.944  0.056

If X is on roll he will win whenever he is off in 1 roll (.639), plus
whenever he is off in 2 rolls and O is off in 2 or more (.358x.056)
for a total of .659 (65.9%).

Although these probabilities are usually very accurate, they are not
necessarily exact.  This is because the probability distributions
assume that a player is always making the move that will bear off the
fastest on average.  Sometimes a different move is best when the
opposing position is considered.

Now let's look at the luck adjustments.  First, the equation-free
presentation.  The cpws that you get using the database are all
calculated assuming that both sides play the rest of the game
moving to minimize the rolls left to bear off.  Let's call these
database estimates of cpws d-cpws.  If both sides move this way,
every d-cpw is exact because the probability distributions are
exact.

Let's assume that both sides play this way.  Now note that every d-cpw
must necessarily be the expectaton of the d-cpws for successor positions.
Since the d-cpws are exact, it can't be any other way.  If this isn't
clear to you, you may want to look at the equations further on.

Since the d-cpw for a position is the expectation of the d-cpws for
successor positions, the expected difference between the d-cpw and
the d-cpws for successors must be zero.  This is exactly the property
that the variance reduction algorithm uses.

The expected difference between the d-cpw for the current position
and the d-cpw for the position that would follow it if you move to
minimize mean bearoff rolls must be zero.  Thus the expected luck
adjustment at each ply is zero.  Thus the expected overall luck
adjustment for each game is zero.  Thus the method introduces no bias.

For those not satisfied with that, here are some equations.

Let n[j] be the probability that the non-roller will be off in j rolls.
Let r[i] be the probability that the roller will be off in i rolls.
Let m[d][i] be the probability that the roller will be off in i more rolls
  after playing roll d in the mean roll minimizing manner.

That is,
                 36
                ---
r[i+1] = (1/36) \  m[d][i]                                (1)
                /
                ---
                d=1

Let M be the maximum possible rolls that can be taken by either side.

There are many ways of expressing the roller's chances.  Let's start
with this one:

                    M             j
                   ---           ---
roller's d-cpw =   \   {  n[j] * \  r[i]  }               (2)
                   /             /
                   ---           ---
                   j=1           i=1

This says that the roller will win whenever the non-roller takes
j rolls and the roller takes anywhere from 1 to j rolls.

Okay, now let's look ahead, after the roller plays roll d, and
express the non-roller's chances following the form of equation (2):

                        M                i
                       ---              ---
non-roller's d-cpw  =  \   {  m[d][i] *  \  n[j]  }       (3)
after roll d played    /                 /
                       ---              ---
                       i=1              j=1


This says that whenever the roller still has i rolls left after playing
roll d (m[d][i]), the non-roller will win if she can take i or fewer rolls.

The roller's chances in the original position are just 1 minus the
average of (3) taken over all the dice rolls:

rollers d-cpw =

            36     M                i
           ---    ---              ---
1 - (1/36) \   [  \   {  m[d][i] *  \  n[j]  } ]
           /      /                 /
           ---    ---              ---
           d=1    i=1              j=1


We can change the order of summation:

     M               36               i
    ---             ---             ---
1 - \   [ { (1/36)  \  m[d][i] } *  \  n[j]  ]
    /               /               /
    ---             ---             ---
    i=1             d=1             j=1


This can be simplified using equation (1) to get:

          M               i
         ---             ---
  1  -   \   {  r[i+1] * \  n[j]  }
         /               /
         ---             ---
         i=1             j=1

This says: if the roller takes i+1 rolls, the non-roller will win by
taking i rolls or fewer -- the roller will therefore win 1 minus that.
Clearly this is the same quantity as equation (2), showing that the
expected d-cpw you get looking ahead is the same as the d-cpw you
calculate directly.

David Montgomery
monty on FIBS
monty@cs.umd.edu
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997) 
Anticomputer positions  (Bill Taylor+, June 1998) 
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006) 
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992) 
BKG beats world champion  (Marty Storer, Sept 1991) 
Backgames  (David Montgomery+, June 1998)  [Long message]
Blockading feature  (Sam Pottle+, Feb 1999)  [Long message]
Board encoding for neural network  (Brian Sheppard, Feb 1997) 
Bot weaknesses  (Douglas Zare, Mar 2003) 
Building and training a neural-net player  (Brian Sheppard, Aug 1998) 
How to count plies?  (Chuck Bower+, Jan 2004)  [GammOnLine forum]
How to count plies?  (tanglebear+, Mar 2003) 
Ideas for improving computer play  (David Montgomery, Feb 1994) 
Ideas on computer players  (Brian Sheppard, Feb 1997) 
Introduction  (Gareth McCaughan, Oct 1994) 
Measuring Difficulty  (John Robson+, Feb 2005)  [GammOnLine forum]
Methods of encoding positions  (Gary Wong, Jan 2001) 
N-ply algorithm  (eXtreme Gammon, Jan 2011) 
Neural net questions  (Brian Sheppard, Mar 1999) 
Pruning the list of moves  (David Montgomery+, Feb 1994) 
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004) 
Source code  (Gary Wong, Dec 1999) 
TD-Gammon vs. Robertie  (David Escoffery, June 1992) 
Training for different gammon values  (Gerry Tesauro, Feb 1996) 
Training neural nets  (Walter Trice, Nov 2000) 
Variance reduction in races  (David Montgomery+, Dec 1998)  [Long message]
Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998)  [Long message]
Variance reduction of rollouts  (Jim Williams, June 1997) 
What is a "neural net"?  (Gary Wong, Oct 1998) 
Writing a backgammon program  (Gary Wong, Jan 1999) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page