Gary Wong wrote:
> It's only a little stronger (of the order of 0.1ppg cubeless after
> 100,000 training games).
PUBEVAL is surprisingly resilient. Of course you can win almost 1.0 ppg
against it if you adopt intelligently defined anti-PUBEVAL tactics, but if
you just train a neural network to play well and pit it against PUBEVAL you
won't do as well.
My network scores around 0.6 ppg (cubeless) versus PUBEVAL. I believe this
is about as well as you can do. (If you read the literature you may have
read that HCGAMMON wins 85% against PUBEVAL, but that turns out to be
erroneous.)
> I haven't yet been patient enough to give it a decent amount of training
> because I keep coming up with ideas for new inputs and end up starting
> again...
This is a reasonable plan, but just be sure that your inputs really make a
difference. You need nothing beyond the raw input description (as defined
by Tesauro), priming (as defined by Berliner), and shot-counting. These
alone are sufficient to play at championship caliber when combined with 1
or 2 ply search engine.
My recommendation is to pare down your inputs to these alone, then debug
those terms exhaustively. Check the value of these terms in every
conceivable natural and unnatural situation. Then train your network to
exhaustion.
Then profile your program's play using test cases. Where does the program
make a mistake? It helps a lot to profile many test cases, so that you can
prioritize your attack on the most-significant type of error. Once you
choose a respresentative test case to work on, you need a procedure that
often results in improving the program.
Every mistaken checkerplay results in two position to analyze, which I call
Predicted (the right move) and Actual (the program's move). Predicted and
Actual will differ in certain input terms. Verify that the input terms are
correctly computed in every case. Correct any bugs, and proceed either to
retrain the evaluation or to choose another test case, depending on the
extent of the changes you have made.
If the inputs are computed correctly, then you must extend the input space
in order to give the program a chance to correct its mistake. To design an
input term that has the best chance of making progress, follow these
guidelines:
1) Never include a term that is a linear combination of existing terms.
Such terms (e.g. inner board point count, number of men borne off, pip
count, number of anchors, and many other backgammon concepts) can be
synthesized by the neural network if they are relevant. You need to add
something new to the program.
2) Make sure that the term you add distinguishes Actual and Predicted. If
it does not, then the chance you will solve this problem is very slight.
3) Make sure that every term has a tactical basis in backgammon theory.
The simplest way is to encapsulate something that would ordinarily
require lookahead to discover. For example, you can create a term that
estimates the number of return shots. Or blot exposure in bearoffs. Or
racing equity.
4) Make sure your term has broad generality, but no broader than
appropriate.
5) Give your term a natural scale. By this I mean that the term has a
range roughly comparable to the other inputs in the system. (The downside
of not doing this is that some weights in your network will train very
slowly compared to others.)
> has anybody had any luck
> experimenting with somehow saving (at least some of) the information
> in previously trained weights while introducing new inputs (or hidden
> nodes, for that matter)?
Of course! I just initialize new weights to small random numbers.
> Is initialising the new weights to random
> values while maintaining the existing weights, and starting training
> all over (back to the initial high alpha, etc.) reasonable?
Never restart training from scratch.
At the very least keep a record of the games you played while training the
first time, so that you can train those games without generating them.
Gennerating a game is 20 times as expensive as training on it.
Don't open alpha all the way back up. It might make sense to increase it a
little, but not much. You will probably undo your progress by opening it up
a lot.
Remember that increasing alpha has the effect of weighing future experience
more heavily than past experience. This makes sense if your network (which
encapsulates past experience) is inexperienced. But when you make a minor
change to an experienced network, it is probably a bad idea to weight the
future much more heavily.
> Does maintaining different values of alpha for different weights help?
Different learning rates help in theory. One point is that if two inputs
differ radically in scale you can use a large learning rate for the input
that has a bigger scale. But in practice it matters little provided that
you give your inputs roughly comparable scale.
Another way that varying learning rates helps is in identifying inputs that
do not matter. For example, if you have an adaptive algorithm that varies
learning rates according to the accuracy gained from adjusting that input,
then the learning rates for irrelevant inputs rapidly drive to 0. Sutton
wrote a paper entitled "Gain Adaptation beats Least Squares" (or something
like that) in which he describes such an algorithm. However, it doesn't
work in practice because it is much slower than backprop. Backpropagation
is fast; adaptive-gain algorithms can take many times as long (Sutton's
algorithm takes 13 times as much time) and that is a huge disadvantage.
The concept was so attractive that I also tried a variant of RProp. It,
too, proved slower than backprop.
My judgement: discard the whole idea.
> How much training does it really take to get an idea of how effective
> your inputs are -- is it reasonable to train different networks (ie.
> networks with different inputs) with a limited number of hidden nodes
> over a short number of training games, and then take the "best" inputs
> and expect them to perform well in a network with lots of hidden nodes
> and extensive training?
I think the procedure you described will work, but it can't be the fastest
way to make progress. Surely you are better off using your expert judgment!
There are descriptions in the literature of constructive methods for
identifying neural network inputs. I haven't found any of them to be
useful.
The method you are describing sounds a lot like coevolution or genetic
engineering of neural inputs. This may work, but you are certainly better
on your own.
> What's a good value for lambda?
Zero.
Theoretical models suggest that lambda should be 1/branching-factor, which
would be zero to 6%, depending on how you meansure branching factor. Anyway
it hardly matters because lambda==0 is a good approxximation to
lambda==0.05.
> Does combining MC and TD training (ie. TD(lambda) at each step followed
> by a TD(1) on the terminal step) buy you anything? What about MC vs. TD
> in general?
TD is better. MC does not learn enough about intermediate stages.
Don't combine MC and TD. The impact of MC is simply to weaken training.
> Is using lookahead evaluations during training ever worth
> the expense?
The answer is no if you mean that you train each position on the 1-ply
lookahead value. Doing a 1-ply lookahead costs you a factor of 21 in
training speed, but you only gain about a factor of 5 in reduction of
training noise over TD. A clear loss.
However, if you reuse the 1-ply search value for at least 5 training
iterations, then you have a clear gain. In practice this is easy to do. But
then you are not using TD, but rather supervised training. I maintain that
supervised training is the right way to go as soon as you have a player of
expert caliber that you can use in a 1-ply search.
> It only takes 10 seconds to come up with another question,
> but days or weeks to answer it...
I can answer your questions in a day, since I have made all these mistakes
myself!
> > In fact, if you choose checker plays using rollouts from the weakest
> > neural network you achieve a skill level at least equal to 3-ply
> > searching.
>
> What kind of positions does that apply to? In non-contact positions I
> believe you unconditionally. But in positions the net misplays
> (especially if it plays one side worse than the other), how much faith do
> you need to have in the original evaluator before you have reasonable
> trust in its rollouts?
Pretty much any position.
You need a very good checkerplayer to have confidence in the abolute equity
value returned by a rollout, but the same does not apply to checkerplay
decisions. The great majority of checkerplays are decided by near-term
phenomena that will be apparent in a rollout.
Warm regards,
Brian Sheppard
|