Backgammon Programming

Programming

Building and training a neural-net player

From:   Brian Sheppard
Address:   bsheppard@hasbro.com
Date:   27 August 1998
Subject:   Re: 5-prime vs. 5-prime
Forum:   rec.games.backgammon
Google:   6s3s0k$dph$1@nnrp1.dejanews.com

Gary Wong wrote: > It's only a little stronger (of the order of 0.1ppg cubeless after > 100,000 training games). PUBEVAL is surprisingly resilient. Of course you can win almost 1.0 ppg against it if you adopt intelligently defined anti-PUBEVAL tactics, but if you just train a neural network to play well and pit it against PUBEVAL you won't do as well. My network scores around 0.6 ppg (cubeless) versus PUBEVAL. I believe this is about as well as you can do. (If you read the literature you may have read that HCGAMMON wins 85% against PUBEVAL, but that turns out to be erroneous.) > I haven't yet been patient enough to give it a decent amount of training > because I keep coming up with ideas for new inputs and end up starting > again... This is a reasonable plan, but just be sure that your inputs really make a difference. You need nothing beyond the raw input description (as defined by Tesauro), priming (as defined by Berliner), and shot-counting. These alone are sufficient to play at championship caliber when combined with 1 or 2 ply search engine. My recommendation is to pare down your inputs to these alone, then debug those terms exhaustively. Check the value of these terms in every conceivable natural and unnatural situation. Then train your network to exhaustion. Then profile your program's play using test cases. Where does the program make a mistake? It helps a lot to profile many test cases, so that you can prioritize your attack on the most-significant type of error. Once you choose a respresentative test case to work on, you need a procedure that often results in improving the program. Every mistaken checkerplay results in two position to analyze, which I call Predicted (the right move) and Actual (the program's move). Predicted and Actual will differ in certain input terms. Verify that the input terms are correctly computed in every case. Correct any bugs, and proceed either to retrain the evaluation or to choose another test case, depending on the extent of the changes you have made. If the inputs are computed correctly, then you must extend the input space in order to give the program a chance to correct its mistake. To design an input term that has the best chance of making progress, follow these guidelines: 1) Never include a term that is a linear combination of existing terms. Such terms (e.g. inner board point count, number of men borne off, pip count, number of anchors, and many other backgammon concepts) can be synthesized by the neural network if they are relevant. You need to add something new to the program. 2) Make sure that the term you add distinguishes Actual and Predicted. If it does not, then the chance you will solve this problem is very slight. 3) Make sure that every term has a tactical basis in backgammon theory. The simplest way is to encapsulate something that would ordinarily require lookahead to discover. For example, you can create a term that estimates the number of return shots. Or blot exposure in bearoffs. Or racing equity. 4) Make sure your term has broad generality, but no broader than appropriate. 5) Give your term a natural scale. By this I mean that the term has a range roughly comparable to the other inputs in the system. (The downside of not doing this is that some weights in your network will train very slowly compared to others.) > has anybody had any luck > experimenting with somehow saving (at least some of) the information > in previously trained weights while introducing new inputs (or hidden > nodes, for that matter)? Of course! I just initialize new weights to small random numbers. > Is initialising the new weights to random > values while maintaining the existing weights, and starting training > all over (back to the initial high alpha, etc.) reasonable? Never restart training from scratch. At the very least keep a record of the games you played while training the first time, so that you can train those games without generating them. Gennerating a game is 20 times as expensive as training on it. Don't open alpha all the way back up. It might make sense to increase it a little, but not much. You will probably undo your progress by opening it up a lot. Remember that increasing alpha has the effect of weighing future experience more heavily than past experience. This makes sense if your network (which encapsulates past experience) is inexperienced. But when you make a minor change to an experienced network, it is probably a bad idea to weight the future much more heavily. > Does maintaining different values of alpha for different weights help? Different learning rates help in theory. One point is that if two inputs differ radically in scale you can use a large learning rate for the input that has a bigger scale. But in practice it matters little provided that you give your inputs roughly comparable scale. Another way that varying learning rates helps is in identifying inputs that do not matter. For example, if you have an adaptive algorithm that varies learning rates according to the accuracy gained from adjusting that input, then the learning rates for irrelevant inputs rapidly drive to 0. Sutton wrote a paper entitled "Gain Adaptation beats Least Squares" (or something like that) in which he describes such an algorithm. However, it doesn't work in practice because it is much slower than backprop. Backpropagation is fast; adaptive-gain algorithms can take many times as long (Sutton's algorithm takes 13 times as much time) and that is a huge disadvantage. The concept was so attractive that I also tried a variant of RProp. It, too, proved slower than backprop. My judgement: discard the whole idea. > How much training does it really take to get an idea of how effective > your inputs are -- is it reasonable to train different networks (ie. > networks with different inputs) with a limited number of hidden nodes > over a short number of training games, and then take the "best" inputs > and expect them to perform well in a network with lots of hidden nodes > and extensive training? I think the procedure you described will work, but it can't be the fastest way to make progress. Surely you are better off using your expert judgment! There are descriptions in the literature of constructive methods for identifying neural network inputs. I haven't found any of them to be useful. The method you are describing sounds a lot like coevolution or genetic engineering of neural inputs. This may work, but you are certainly better on your own. > What's a good value for lambda? Zero. Theoretical models suggest that lambda should be 1/branching-factor, which would be zero to 6%, depending on how you meansure branching factor. Anyway it hardly matters because lambda==0 is a good approxximation to lambda==0.05. > Does combining MC and TD training (ie. TD(lambda) at each step followed > by a TD(1) on the terminal step) buy you anything? What about MC vs. TD > in general? TD is better. MC does not learn enough about intermediate stages. Don't combine MC and TD. The impact of MC is simply to weaken training. > Is using lookahead evaluations during training ever worth > the expense? The answer is no if you mean that you train each position on the 1-ply lookahead value. Doing a 1-ply lookahead costs you a factor of 21 in training speed, but you only gain about a factor of 5 in reduction of training noise over TD. A clear loss. However, if you reuse the 1-ply search value for at least 5 training iterations, then you have a clear gain. In practice this is easy to do. But then you are not using TD, but rather supervised training. I maintain that supervised training is the right way to go as soon as you have a player of expert caliber that you can use in a 1-ply search. > It only takes 10 seconds to come up with another question, > but days or weeks to answer it... I can answer your questions in a day, since I have made all these mistakes myself! > > In fact, if you choose checker plays using rollouts from the weakest > > neural network you achieve a skill level at least equal to 3-ply > > searching. > > What kind of positions does that apply to? In non-contact positions I > believe you unconditionally. But in positions the net misplays > (especially if it plays one side worse than the other), how much faith do > you need to have in the original evaluator before you have reasonable > trust in its rollouts? Pretty much any position. You need a very good checkerplayer to have confidence in the abolute equity value returned by a rollout, but the same does not apply to checkerplay decisions. The great majority of checkerplays are decided by near-term phenomena that will be apparent in a rollout. Warm regards, Brian Sheppard

Did you find the information in this article useful?

Do you have any comments you'd like to add?