Backgammon Programming

Programming

Ideas on computer players

From:   Brian Sheppard
Address:   brians@mstone.com
Date:   21 February 1997
Subject:   Re: Ideas on computer players
Forum:   rec.games.backgammon
Google:   01bc1ff8$fd19acc0$3ac032cf@polaris.mstone.com

Kevin Whyte wrote: > The recent thread about weaknesses of Jellyfish got > me moving on my plan to train my own program. I know I'm > not the only person to do this: first there was TD gammon, > but then JF, loner, motif, SnowWhite(?), etc. It seems > to me that with a few more good ideas we could have a > program that plays noticably better than humans, and > uniformly well. Its roll-outs would be, for practical > purposes, "the truth". Dear Kevin, I'm glad to see you are setting your sights high! In this forum I have lately been sounding off against blind acceptance of rollout data. You know the type ("A JF rollout shows that 22/18 is .00001 better, so that settles the issue..."). However, I do not have a negative opinion of JF as a player. Far from it. My complaint is that analysis of positions has become too mechanical. Positional analysis in this forum has deteriorated into cut-and-paste of rollout data. Back to the JF-as-player issue: you will find that there is not much distance between JF and perfection. I believe that the practical issue of playing to exploit a weaker opponent is a far more significant factor than JF's occasional error. > The basic idea is temporal difference training, as with > all the neural-net programs I know of. The big difference > is that I'm doing a "parasite driven" pool of starting > positions. What that means is that rather than just doing > training games from the initial position, I have a collection > of positions I train from. This collection evolves over > time, with the fitness criterion being the difference of > 1ply look ahead eval and static eval. This is a promising approach, which I believe has been used by other backgammon developers. (Tom Keith uses a similar pattern to develop Motif, for instance.) There are a couple of things you should watch out for. I offer this advice in the hope that you will avoid some pitfalls and set aside some time to solve problems that you might have overlooked. First, straight TD training, as described by Sutton, is about 20 times as fast as doing a 1-ply search on every position. That is, you can complete 20 times as many games of training in the same amount of time. 1-ply lookahead training does develop lower statistical variance as training progresses, so you get some of that advantage back because you require fewer training games to reach the same level of play. But nothing like a factor of 20. Second, training is non-stationary. That is: as your evaluation trains, the evaluation of each training position changes. This implies that you need a strategy for recomputing the evaluations of the parasite pool as evolution proceeds. Third, straight TD from no initial knowledge has proven to be sufficient for this task. An advanced training mechanism might not be the answer. Fourth, there is a condition attached to the sufficiency of TD: you must have inputs to the neural network that are sufficient to capture expert judgment. You might find that constructing such a set of features is a very difficult task. > right now NN programs seem to have trouble with backgames > and outfield primes. Why? Well, the earlier training teaches > them to play in such a way as to avoid having that many men > back, so they rarely get into such a situation. Even when they > do, they play it badly and don't learn the right things. This is a promising theory, but isn't there an alternative explanation? JF might not have the positional features it needs to play such situations tactically well. For instance, does it have a pattern for slotting the back of a prime? What about a pattern for bringing checkers to bear on escape points? What about rules for when it should split an anchor in a backgame? If a neural network is missing an essential piece of positional knowledge, then the network has very little chance of synthesizing that knowledge. Also, I recall that Tesauro had a trick for inducing TD-Gammon to train positions that were unlikely to occur. The purpose of this trick was to make TD-Gammon explore the search space thoroughly. It was something like "start with a neural network that has a high opinion of every position." Details are probably in one of his TD-Gammon papers. OK, that does it. I am looking forward to the fruits of your research, and if you ever want to pass an idea by me, feel free. Warm Regards, Brian

Did you find the information in this article useful?

Do you have any comments you'd like to add?