Kevin Whyte wrote:
> The recent thread about weaknesses of Jellyfish got
> me moving on my plan to train my own program. I know I'm
> not the only person to do this: first there was TD gammon,
> but then JF, loner, motif, SnowWhite(?), etc. It seems
> to me that with a few more good ideas we could have a
> program that plays noticably better than humans, and
> uniformly well. Its roll-outs would be, for practical
> purposes, "the truth".
Dear Kevin,
I'm glad to see you are setting your sights high!
In this forum I have lately been sounding off against blind
acceptance of rollout data. You know the type ("A JF rollout
shows that 22/18 is .00001 better, so that settles the
issue...").
However, I do not have a negative opinion of JF as a player.
Far from it. My complaint is that analysis of positions has
become too mechanical. Positional analysis in this forum has
deteriorated into cut-and-paste of rollout data.
Back to the JF-as-player issue: you will find that there is not
much distance between JF and perfection. I believe that the
practical issue of playing to exploit a weaker opponent is a
far more significant factor than JF's occasional error.
> The basic idea is temporal difference training, as with
> all the neural-net programs I know of. The big difference
> is that I'm doing a "parasite driven" pool of starting
> positions. What that means is that rather than just doing
> training games from the initial position, I have a collection
> of positions I train from. This collection evolves over
> time, with the fitness criterion being the difference of
> 1ply look ahead eval and static eval.
This is a promising approach, which I believe has been used by
other backgammon developers. (Tom Keith uses a similar pattern
to develop Motif, for instance.)
There are a couple of things you should watch out for. I offer
this advice in the hope that you will avoid some pitfalls and
set aside some time to solve problems that you might have
overlooked.
First, straight TD training, as described by Sutton, is about 20
times as fast as doing a 1-ply search on every position. That is,
you can complete 20 times as many games of training in the same
amount of time. 1-ply lookahead training does develop lower
statistical variance as training progresses, so you get some of
that advantage back because you require fewer training games to
reach the same level of play. But nothing like a factor of 20.
Second, training is non-stationary. That is: as your evaluation
trains, the evaluation of each training position changes. This
implies that you need a strategy for recomputing the evaluations
of the parasite pool as evolution proceeds.
Third, straight TD from no initial knowledge has proven to be
sufficient for this task. An advanced training mechanism
might not be the answer.
Fourth, there is a condition attached to the sufficiency of TD:
you must have inputs to the neural network that are sufficient
to capture expert judgment. You might find that constructing
such a set of features is a very difficult task.
> right now NN programs seem to have trouble with backgames
> and outfield primes. Why? Well, the earlier training teaches
> them to play in such a way as to avoid having that many men
> back, so they rarely get into such a situation. Even when they
> do, they play it badly and don't learn the right things.
This is a promising theory, but isn't there an alternative
explanation? JF might not have the positional features it needs
to play such situations tactically well. For instance, does it
have a pattern for slotting the back of a prime? What about
a pattern for bringing checkers to bear on escape points? What
about rules for when it should split an anchor in a backgame?
If a neural network is missing an essential piece of positional
knowledge, then the network has very little chance of synthesizing
that knowledge.
Also, I recall that Tesauro had a trick for inducing TD-Gammon to
train positions that were unlikely to occur. The purpose of this
trick was to make TD-Gammon explore the search space thoroughly. It
was something like "start with a neural network that has a high
opinion of every position." Details are probably in one of his
TD-Gammon papers.
OK, that does it. I am looking forward to the fruits of your research,
and if you ever want to pass an idea by me, feel free.
Warm Regards,
Brian
|