Programming

 TD-Gammon vs. Robertie

 From: David Escoffery Address: davide@sco.COM Date: 29 June 1992 Subject: Re: 3 cheers! ....and a history question. Forum: rec.games.backgammon Google: 1992Jun29.045215.1707@sco.com

```The March-April (Volume 2, Number 2) and May-June (Volume 2, Number 3)
issues of "Inside Backgammon" featured a two-part article by twice world
champion  Bill Robertie reporting on the state of backgammon computers and
annotates his match against Dr. Gerald Tesauro's program, TD-Gammon. and I
trust that Kent will not mind me reproducing some of Robertie's article for
the benefit of those on the net.

Anyways, Dr. Gerald Tesauro is employed at the IBM Research Labs in White
Plains, New York. His latest backgammon program is called TD-Gammon and is
based on neural network theory. TD-Gammon runs on an RS-6000 workstation.
In the beginning, it was programmed only with the actual rules of
backgammon and the ability to generate legal moves. No knowledge was
programmed into it as to what constituted a good or a bad move. It "knew"
nothing about making points, hitting blots, or anything else. Over the
course of several months, it played 300,000 games against itself. In the
beginning, it picked its moves at random from the list of possible legal
moves. After each game, a logic routine made informed guesses as to what
moves in the previous game may or may not have been errors, based upon a
sophisticated mathematical theory of learning. The program's positional
evaluator routine continuously modified itself, based upon its results in
previous games.

The article and the games are interesting, as Robertie annotates and
analyzes each move. At the end he summarizes ..

"TD-Gammon and I played most of a day, a total of 31 games. I won 19
points, an average of 0.61 points per game. On balance, I was lucky. Game
16, as you saw, could easily have gone the other way at the end, a 16-point
swing by itself [Robertie had to throw a double on his last roll to win].
My estimate, after reviewing the entire session, was that I would do well
to average 0.20 to 0.25 points per game. This figure makes TD-Gammon the
strongest backgammon program in existence, most likely better than
Berliner's program of 13 years ago, although that's no longer available for
comparison.

"Not only is TD-Gammon interesting as a backgammon program, it represents
an astonishing achievement for the neural network approach to artificial
intelligence. Remember that this program has no human knowledge built into
it. Everything it "knows", it deduced by playing against itself, then
improving by applying sophisticated mathematical learning algorithms to the
results of its games.

"Just before going to press, we received word that Malcolm Davis and Paul
Magriel made journeys up to White Plains to match wits with TD-Gammon.
Malcolm Davis broke even in 12 games, although TD-Gammon won 8 out of 12.
Paul Magriel got backgammoned while playing a favorable back game and ended
up negative for the session."

There was one anecdote Robertie relates that I found interesting.  In its
300,000 games of experience, Robertie felt that TD-Gammon has not "learned"
to slot the 5-point with an opening 4-1, regarding the split on an opening
21, 41, or 51 as superior to the slot.  After rolling out the opening
position 1000 times, the program finds that while 13/9 24/23 makes it
exactly even money, slotting the 5-point leaves it an underdog by 0.05
points.

===========================================================================
David Escoffery                          Tel:        (408) 427-7718
The Santa Cruz Operation                 Internet:   davide@sco.COM
P.O. Box 1900
Santa Cruz, CA 95061
===========================================================================
```

### Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997)
Anticomputer positions  (Bill Taylor+, June 1998)
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006)
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992)
BKG beats world champion  (Marty Storer, Sept 1991)
Backgames  (David Montgomery+, June 1998)
Blockading feature  (Sam Pottle+, Feb 1999)
Board encoding for neural network  (Brian Sheppard, Feb 1997)
Bot weaknesses  (Douglas Zare, Mar 2003)
Building and training a neural-net player  (Brian Sheppard, Aug 1998)
How to count plies?  (Chuck Bower+, Jan 2004)
How to count plies?  (tanglebear+, Mar 2003)
Ideas for improving computer play  (David Montgomery, Feb 1994)
Ideas on computer players  (Brian Sheppard, Feb 1997)
Introduction  (Gareth McCaughan, Oct 1994)
Measuring Difficulty  (John Robson+, Feb 2005)
Methods of encoding positions  (Gary Wong, Jan 2001)
N-ply algorithm  (eXtreme Gammon, Jan 2011)
Neural net questions  (Brian Sheppard, Mar 1999)
Pruning the list of moves  (David Montgomery+, Feb 1994)
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004)
Source code  (Gary Wong, Dec 1999)
TD-Gammon vs. Robertie  (David Escoffery, June 1992)
Training for different gammon values  (Gerry Tesauro, Feb 1996)
Training neural nets  (Walter Trice, Nov 2000)
Variance reduction in races  (David Montgomery+, Dec 1998)
Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998)
Variance reduction of rollouts  (Jim Williams, June 1997)
What is a "neural net"?  (Gary Wong, Oct 1998)
Writing a backgammon program  (Gary Wong, Jan 1999)