Forum Archive :
Programming
Variance reduction of rollouts

The classic algorithm for doing roll outs is to start in
position which is to be analyzed. From this position,
the game is repeatedly rolled out to completion. A equity
value is assigned to the completion result. The equity
of the starting position is approximated by the average value
of the completion equities. The implicit assumption is that
the best move is always made, but subject to this assumption
the average value of the completion equity will converge
stochastically to the true equity of the starting position.
The proposed algorithm will converge to the same limit, but can
potentially converge much faster. This algorithm assumes
the existance of an evaluation function which approximates
the equity of a given position. The important feature of
this algorithm is that the better the evaluation function is,
the faster the convergence will be, but no matter how inaccurate
the evaluation function is, the convergence limit is unaffected.
For any give position, the "disparity" of that position is computed
as follows. For each possible dice roll, the best move is determined
and the evaluation function is applied to the resulting position.
The disparity is then equal to the weighted average of the evaluations
of all these resulting positions minus the evalution of the
original position. The better the evaluation funcation is,
the smaller the disparity will be.
The roll out proceeds normally, but the result of each trial is
equal to the equity value determined by applying the evalution function
to the original position plus the sum of the disparities of all the
positions that occurred in the roll out (including the original
position).
The proof of correctness is inductive. It is assumed that the
evaluation function correctly evaluates a "game over" position.
The expected value of the roll out result of a position is equal to
the weighted average of the expected values of the roll outs of
each of the next possible positions,
plus the evaluation of the current position, minus the weighted average
of the evaluations of the next postions, plus the disparity of the
current position. This in turn is equal to just the weighted average
of the expected values of the roll outs of the next positions.
These are correct by the induction hypothesis.




Programming
 Adjusting to a weaker opponent (Brian Sheppard, July 1997)
 Anticomputer positions (Bill Taylor+, June 1998)
 BKG 9.8 vs. Villa (Raccoon+, Aug 2006)
 BKG 9.8 vs. Villa (Andreas Schneider, June 1992)
 BKG beats world champion (Marty Storer, Sept 1991)
 Backgames (David Montgomery+, June 1998)
 Blockading feature (Sam Pottle+, Feb 1999)
 Board encoding for neural network (Brian Sheppard, Feb 1997)
 Bot weaknesses (Douglas Zare, Mar 2003)
 Building and training a neuralnet player (Brian Sheppard, Aug 1998)
 How to count plies? (Chuck Bower+, Jan 2004)
 How to count plies? (tanglebear+, Mar 2003)
 Ideas for improving computer play (David Montgomery, Feb 1994)
 Ideas on computer players (Brian Sheppard, Feb 1997)
 Introduction (Gareth McCaughan, Oct 1994)
 Measuring Difficulty (John Robson+, Feb 2005)
 Methods of encoding positions (Gary Wong, Jan 2001)
 Nply algorithm (eXtreme Gammon, Jan 2011)
 Neural net questions (Brian Sheppard, Mar 1999)
 Pruning the list of moves (David Montgomery+, Feb 1994)
 Search in Trees with Chance Nodes (Thomas Hauk, Feb 2004)
 Source code (Gary Wong, Dec 1999)
 TDGammon vs. Robertie (David Escoffery, June 1992)
 Training for different gammon values (Gerry Tesauro, Feb 1996)
 Training neural nets (Walter Trice, Nov 2000)
 Variance reduction in races (David Montgomery+, Dec 1998)
 Variance reduction of rollouts (Michael J. Zehr+, Aug 1998)
 Variance reduction of rollouts (Jim Williams, June 1997)
 What is a "neural net"? (Gary Wong, Oct 1998)
 Writing a backgammon program (Gary Wong, Jan 1999)
From GammOnLine
Long message
Recommended reading
Recent addition

 
