Programming

Forum Archive : Programming

 
Ideas on computer players

From:   Brian Sheppard
Address:   brians@mstone.com
Date:   21 February 1997
Subject:   Re: Ideas on computer players
Forum:   rec.games.backgammon
Google:   01bc1ff8$fd19acc0$3ac032cf@polaris.mstone.com

Kevin Whyte wrote:
>   The recent thread about weaknesses of Jellyfish got
> me moving on my plan to train my own program.  I know I'm
> not the only person to do this: first there was TD gammon,
> but then JF, loner, motif, SnowWhite(?), etc.  It seems
> to me that with a few more good ideas we could have a
> program that plays noticably better than humans, and
> uniformly well.  Its roll-outs would be, for practical
> purposes, "the truth".

Dear Kevin,

I'm glad to see you are setting your sights high!

In this forum I have lately been sounding off against blind
acceptance of rollout data. You know the type ("A JF rollout
shows that 22/18 is .00001 better, so that settles the
issue...").

However, I do not have a negative opinion of JF as a player.
Far from it. My complaint is that analysis of positions has
become too mechanical. Positional analysis in this forum has
deteriorated into cut-and-paste of rollout data.

Back to the JF-as-player issue: you will find that there is not
much distance between JF and perfection. I believe that the
practical issue of playing to exploit a weaker opponent is a
far more significant factor than JF's occasional error.

>   The basic idea is temporal difference training, as with
> all the neural-net programs I know of.  The big difference
> is that I'm doing a "parasite driven" pool of starting
> positions.  What that means is that rather than just doing
> training games from the initial position, I have a collection
> of positions I train from.  This collection evolves over
> time, with the fitness criterion being the difference of
> 1ply look ahead eval and static eval.

This is a promising approach, which I believe has been used by
other backgammon developers. (Tom Keith uses a similar pattern
to develop Motif, for instance.)

There are a couple of things you should watch out for. I offer
this advice in the hope that you will avoid some pitfalls and
set aside some time to solve problems that you might have
overlooked.

First, straight TD training, as described by Sutton, is about 20
times as fast as doing a 1-ply search on every position. That is,
you can complete 20 times as many games of training in the same
amount of time. 1-ply lookahead training does develop lower
statistical variance as training progresses, so you get some of
that advantage back because you require fewer training games to
reach the same level of play. But nothing like a factor of 20.

Second, training is non-stationary. That is: as your evaluation
trains, the evaluation of each training position changes. This
implies that you need a strategy for recomputing the evaluations
of the parasite pool as evolution proceeds.

Third, straight TD from no initial knowledge has proven to be
sufficient for this task. An advanced training mechanism
might not be the answer.

Fourth, there is a condition attached to the sufficiency of TD:
you must have inputs to the neural network that are sufficient
to capture expert judgment. You might find that constructing
such a set of features is a very difficult task.

>   right now NN programs seem to have trouble with backgames
> and outfield primes.  Why?  Well, the earlier training teaches
> them to play in such a way as to avoid having that many men
> back, so they rarely get into such a situation.  Even when they
> do, they play it badly and don't learn the right things.

This is a promising theory, but isn't there an alternative
explanation? JF might not have the positional features it needs
to play such situations tactically well. For instance, does it
have a pattern for slotting the back of a prime? What about
a pattern for bringing checkers to bear on escape points? What
about rules for when it should split an anchor in a backgame?

If a neural network is missing an essential piece of positional
knowledge, then the network has very little chance of synthesizing
that knowledge.

Also, I recall that Tesauro had a trick for inducing TD-Gammon to
train positions that were unlikely to occur. The purpose of this
trick was to make TD-Gammon explore the search space thoroughly. It
was something like "start with a neural network that has a high
opinion of every position." Details are probably in one of his
TD-Gammon papers.

OK, that does it. I am looking forward to the fruits of your research,
and if you ever want to pass an idea by me, feel free.

Warm Regards,
Brian
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Programming

Adjusting to a weaker opponent  (Brian Sheppard, July 1997) 
Anticomputer positions  (Bill Taylor+, June 1998) 
BKG 9.8 vs. Villa  (Raccoon+, Aug 2006) 
BKG 9.8 vs. Villa  (Andreas Schneider, June 1992) 
BKG beats world champion  (Marty Storer, Sept 1991) 
Backgames  (David Montgomery+, June 1998)  [Long message]
Blockading feature  (Sam Pottle+, Feb 1999)  [Long message]
Board encoding for neural network  (Brian Sheppard, Feb 1997) 
Bot weaknesses  (Douglas Zare, Mar 2003) 
Building and training a neural-net player  (Brian Sheppard, Aug 1998) 
How to count plies?  (Chuck Bower+, Jan 2004)  [GammOnLine forum]
How to count plies?  (tanglebear+, Mar 2003) 
Ideas for improving computer play  (David Montgomery, Feb 1994) 
Ideas on computer players  (Brian Sheppard, Feb 1997) 
Introduction  (Gareth McCaughan, Oct 1994) 
Measuring Difficulty  (John Robson+, Feb 2005)  [GammOnLine forum]
Methods of encoding positions  (Gary Wong, Jan 2001) 
N-ply algorithm  (eXtreme Gammon, Jan 2011) 
Neural net questions  (Brian Sheppard, Mar 1999) 
Pruning the list of moves  (David Montgomery+, Feb 1994) 
Search in Trees with Chance Nodes  (Thomas Hauk, Feb 2004) 
Source code  (Gary Wong, Dec 1999) 
TD-Gammon vs. Robertie  (David Escoffery, June 1992) 
Training for different gammon values  (Gerry Tesauro, Feb 1996) 
Training neural nets  (Walter Trice, Nov 2000) 
Variance reduction in races  (David Montgomery+, Dec 1998)  [Long message]
Variance reduction of rollouts  (Michael J. Zehr+, Aug 1998)  [Long message]
Variance reduction of rollouts  (Jim Williams, June 1997) 
What is a "neural net"?  (Gary Wong, Oct 1998) 
Writing a backgammon program  (Gary Wong, Jan 1999) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page