Backgammon Programming

Programming

TD-Gammon vs. Robertie

From:   David Escoffery
Address:   davide@sco.COM
Date:   29 June 1992
Subject:   Re: 3 cheers! ....and a history question.
Forum:   rec.games.backgammon
Google:   1992Jun29.045215.1707@sco.com

The March-April (Volume 2, Number 2) and May-June (Volume 2, Number 3) issues of "Inside Backgammon" featured a two-part article by twice world champion Bill Robertie reporting on the state of backgammon computers and annotates his match against Dr. Gerald Tesauro's program, TD-Gammon. and I trust that Kent will not mind me reproducing some of Robertie's article for the benefit of those on the net. Anyways, Dr. Gerald Tesauro is employed at the IBM Research Labs in White Plains, New York. His latest backgammon program is called TD-Gammon and is based on neural network theory. TD-Gammon runs on an RS-6000 workstation. In the beginning, it was programmed only with the actual rules of backgammon and the ability to generate legal moves. No knowledge was programmed into it as to what constituted a good or a bad move. It "knew" nothing about making points, hitting blots, or anything else. Over the course of several months, it played 300,000 games against itself. In the beginning, it picked its moves at random from the list of possible legal moves. After each game, a logic routine made informed guesses as to what moves in the previous game may or may not have been errors, based upon a sophisticated mathematical theory of learning. The program's positional evaluator routine continuously modified itself, based upon its results in previous games. The article and the games are interesting, as Robertie annotates and analyzes each move. At the end he summarizes .. "TD-Gammon and I played most of a day, a total of 31 games. I won 19 points, an average of 0.61 points per game. On balance, I was lucky. Game 16, as you saw, could easily have gone the other way at the end, a 16-point swing by itself [Robertie had to throw a double on his last roll to win]. My estimate, after reviewing the entire session, was that I would do well to average 0.20 to 0.25 points per game. This figure makes TD-Gammon the strongest backgammon program in existence, most likely better than Berliner's program of 13 years ago, although that's no longer available for comparison. "Not only is TD-Gammon interesting as a backgammon program, it represents an astonishing achievement for the neural network approach to artificial intelligence. Remember that this program has no human knowledge built into it. Everything it "knows", it deduced by playing against itself, then improving by applying sophisticated mathematical learning algorithms to the results of its games. "Just before going to press, we received word that Malcolm Davis and Paul Magriel made journeys up to White Plains to match wits with TD-Gammon. Malcolm Davis broke even in 12 games, although TD-Gammon won 8 out of 12. Paul Magriel got backgammoned while playing a favorable back game and ended up negative for the session." There was one anecdote Robertie relates that I found interesting. In its 300,000 games of experience, Robertie felt that TD-Gammon has not "learned" to slot the 5-point with an opening 4-1, regarding the split on an opening 21, 41, or 51 as superior to the slot. After rolling out the opening position 1000 times, the program finds that while 13/9 24/23 makes it exactly even money, slotting the 5-point leaves it an underdog by 0.05 points. =========================================================================== David Escoffery Tel: (408) 427-7718 The Santa Cruz Operation Internet: davide@sco.COM P.O. Box 1900 Santa Cruz, CA 95061 ===========================================================================

Did you find the information in this article useful?

Do you have any comments you'd like to add?