Programming

Forum Archive : Programming

 
Variance reduction of rollouts

From:   Jim Williams
Address:   jimw@giga-net.com
Date:   11 June 1997
Subject:   Proposed Algorithm for Roll Outs
Forum:   rec.games.backgammon
Google:   339F21F8.3585@giga-net.com

The classic algorithm for doing roll outs is to start in
position which is to be analyzed.  From this position,
the game is repeatedly rolled out to completion.  A equity
value is assigned to the completion result.  The equity
of the starting position is approximated by the average value
of the completion equities.  The implicit assumption is that
the best move is always made, but subject to this assumption
the average value of the completion equity will converge
stochastically to the true equity of the starting position.

The proposed algorithm will converge to the same limit, but can
potentially converge much faster.  This algorithm assumes
the existance of an evaluation function which approximates
the equity of a given position.  The important feature of
this algorithm is that the better the evaluation function is,
the faster the convergence will be, but no matter how inaccurate
the evaluation function is, the convergence limit is unaffected.

For any give  position, the "disparity" of that position is computed
as follows.  For each possible dice roll, the best move is determined
and the evaluation function is applied to the resulting position.
The disparity is then equal to the weighted average of the evaluations
of all these resulting positions minus the evalution of the
original position.  The better the evaluation funcation is,
the smaller the disparity will be.

The roll out proceeds normally, but the result of each trial is
equal to the equity value determined by applying the evalution function
to the original position plus the sum of the disparities of all the
positions that occurred in the roll out (including the original
position).

The proof of correctness is inductive.  It is assumed that the
evaluation function correctly evaluates a "game over" position.
The expected value of the roll out result of a position is equal to
the weighted average of the expected values of the roll outs of
each of the next possible positions,
plus the evaluation of the current position, minus the weighted average
of the evaluations of the next postions, plus the disparity of the
current position.  This in turn is equal to just the weighted average
of the expected values of the roll outs of the next positions.
These are correct by the induction hypothesis.
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997) 
Anticomputer positions  (Bill Taylor+, June 1998) 
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006) 
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992) 
BKG beats world champion  (Marty Storer, Sept 1991) 
Backgames  (David Montgomery+, June 1998)  [Long message]
Blockading feature  (Sam Pottle+, Feb 1999)  [Long message]
Board encoding for neural network  (Brian Sheppard, Feb 1997) 
Bot weaknesses  (Douglas Zare, Mar 2003) 
Building and training a neural-net player  (Brian Sheppard, Aug 1998) 
How to count plies?  (Chuck Bower+, Jan 2004)  [GammOnLine forum]
How to count plies?  (tanglebear+, Mar 2003) 
Ideas for improving computer play  (David Montgomery, Feb 1994) 
Ideas on computer players  (Brian Sheppard, Feb 1997) 
Introduction  (Gareth McCaughan, Oct 1994) 
Measuring Difficulty  (John Robson+, Feb 2005)  [GammOnLine forum]
Methods of encoding positions  (Gary Wong, Jan 2001) 
N-ply algorithm  (eXtreme Gammon, Jan 2011) 
Neural net questions  (Brian Sheppard, Mar 1999) 
Pruning the list of moves  (David Montgomery+, Feb 1994) 
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004) 
Source code  (Gary Wong, Dec 1999) 
TD-Gammon vs. Robertie  (David Escoffery, June 1992) 
Training for different gammon values  (Gerry Tesauro, Feb 1996) 
Training neural nets  (Walter Trice, Nov 2000) 
Variance reduction in races  (David Montgomery+, Dec 1998)  [Long message]
Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998)  [Long message]
Variance reduction of rollouts  (Jim Williams, June 1997) 
What is a "neural net"?  (Gary Wong, Oct 1998) 
Writing a backgammon program  (Gary Wong, Jan 1999) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page