How Motif Works
A number of people of written to ask how Motif works: How does it choose the plays it makes? How does it learn? Here are a few notes on the program.
(Motif is a Java applet that plays the game of backgammon. You can play backgammon against Motif at
http://www.bkgm.com/motif.html. Motif was created in 1996.)
The heart of Motif is the position evaluator. The evaluator takes a given board position and returns an estimate of the probability of winning the game from that position for the player whose turn it is to roll.
The position evaluator is used for making both checker plays and cube decisions. The choice of how to play a given dice roll is made by consulting the evaluator for each different position that may result from the play of the roll. The selected play is the one which gives the opponent (who will then be "on roll") the lowest probability of winning.
The fundamental part of position evaluation is the "static evaluator," so called because it performs no lookahead in its analysis. The static evaluator works by taking a number of measurements of the board position and combining them in a formula to produce an estimate of the position's value. I will describe the details of Motif's static evaluator in the next section.
The static evaluator can be improved upon by building it into an evaluator which looks ahead to future possibilities. The depth of lookahead is measured in "plies." In backgammon, one ply refers to a roll of the dice by one of the players and the play he makes for that roll.
Writing a position evaluator which looks ahead one ply is quite simple. To estimate the value of a position, the program goes through each of the 21 possible rolls and uses the static evaluator to determine the best way to play each roll. The static evaluations of all the resulting positions are then added together with each evaluation multiplied by its probability of occurrence (mixed rolls being twice as likely as doubles rolls). The final result is generally more accurate than simply doing a static evaluation.
The lookahead evaluator is simple in concept, but there are a lot of calculations involved. For a typical position and dice roll, there might be 20 or so possible ways to play the roll. That means that the static evaluator is called 21 x 20, or about 400 times, just to look one ply ahead in the game. Each additional ply of lookahead uses 21 times more calls to the static evaluator.
Motif performs a one-ply evaluation of each play alternative when it makes its play selections. Because Motif is written in Java, and because Java is an interpreted language, the time required to look farther ahead is too great if the game is to proceed at a reasonable pace.
Details of the Static Evaluator
As I mentioned above, the static evaluator works by taking a number of measurements of the current position and somehow combining them to produce an estimate of the position's equity. It is this "somehow" that we will now discuss.
The sort of measurements Motif takes of the position are all quite simple ones: It looks at things such as the pip count for each player, the number of points each player holds, the number of blots, the number of direct shots, the number of indirect shots, etc. They are all things that are easy to calculate.
I have tried performing more sophisticated measurements. For example, I tried to do a fairly accurate calculation of the probability of a blot being hit. But this did not produce a significantly better evaluator, so I rejected the idea and stayed simple.
Each parameter (measured value) is then multiplied by a "weight." The magnitude of the weight determines how big a contribution that parameter makes in the final result. The sign of the weight determines whether the parameter is to have a positive or negative impact.
The proper contribution for each parameter often depends on other aspects of the game. The value of hitting, for example, is usually greater when you are behind. So there is a relationship between (1) the number of checkers your opponent has on the bar and (2) the relative pip count.
We can account for such relationships by adding more terms to the static evaluator. To the linear terms we have already described, we add second-degree terms consisting of pairs of parameters multiplied together. Each second-degree term also has its own weight.
The value produced by the static evaluator is most useful to us if we can interpret it as the probability of the on-roll player winning from that position. In other words, we want the evaluator to give us a number between zero and one. This permits the lookahead evaluator described above to work properly, and it gives us an estimate of our chances in the game, which is necessary for making correct doubling decisions.
To help the evaluator distribute its results properly within the required range, and to guarantee that it never produces a value outside this range, we take the output of the sum of the weighted parameters and put it through a "squashing function". The function we use is y = (tanh(x)+1)/2 . This is a smooth and increasing function which takes any real value as input and produces a result between 0 and 1. Large negative numbers give values close to zero and large positive numbers give values close to one.
How Motif Learns
Determining the Proper Weights
The final problem, then, is to come up with the proper weightings for parameters in the evaluation function. This is a difficult but important question. The quality of the weights has a significant impact on the playing ability of the program.
Here is the plan:
- Obtain a representative sample of the positions that come up in a typical game of backgammon.
- Determine each position's true value.
- Systematically adjust the weights in Motif's evaluation function until the values it returns for the sample positions come as close as possible to the positions' true values.
Obtaining a Sample of Positions
To get a representative sample of backgammon positions, I wrote a little program that played backgammon against itself. This program had a very simple position evaluator that did not work particularly well when used directly. But it worked fairly well for "rolling out" positions.
Rolling out a backgammon position is a Monte Carlo technique for estimating the probability of winning from that position. The program simply plays against itself starting from the position under consideration. It plays the same position over and over again, using different dice rolls each time, and counts the number of times each side wins.
The program would, therefore, make its decision of how to play a given dice roll based on the rollout results of all the possible ways to play that roll. The play that gave the best rollout results was the one that was selected.
I had the program play against itself in this manner many times and the positions that came up in those games were recorded. These became the representative sample of backgammon positions that I used.
Evaluating the Positions
The next task was to find the true value of each position. This is impossible, because nobody has written a program which can give an accurate assessment of every backgammon position. But any reasonable estimate of each position's value would give us something to work with.
The best information I had was the rollout results from the previous step. Part of the process of generating the sample positions above involved rolling out each position that came up. I could use those rollout results as an estimate of the position's equity. The fraction of wins in each case was deemed to be the probability of winning the game from that position.
I don't imagine many of these estimates where very accurate, but it was a start.
Optimizing the Weights
The final step was to adjust the weights in Motif's static evaluator so that the value it computed for each of the sample positions agreed as closely as possible with the rollout results (i.e., the position's presumed value).
First, I defined an error function, which was the sum of the squares of the differences between the calculated and rolled out estimate of each position. The goal was now to minimize this error. The smaller the error, the more closely the static evaluator's estimate would mimic the rollout results.
To optimize the weights, I started with all the weights at zero. I proceeded to adjust one of the weights to find the setting that produced the minimum error. After determining the best value for that weight, I would go on to optimize the next one and continue until all the weights had been optimized.
Of course, adjusting subsequent weights meant that the first ones would no longer be at their optimal values. So, after the first optimization pass, I went through and adjusted all the weights again. This process was repeated several times until no further reduction in the error was possible.
The program that resulted from this optimization was dubbed Motif Version Zero.
Motif Version One
With Motif Version Zero now available to perform rollouts, I could proceed to get much better estimates of the values of the sample positions. In addition, I decided that a larger number of samples was called for. So all the steps above were repeated.
I had Version Zero play several hundred games against itself. Each position that came up was rolled out a few hundred times and the results tabulated. There were now approximately 50,000 sample positions with rollout estimates that were much more accurate than those used to create Version Zero.
The weights for the static evaluator were then all reoptimized to agree with the new positions and rollout data. The result of this process was Motif Version One.
Jay Scott has written an overview of how many of the best backgammon
programs work, entitled
Neural Net Backgammon Programs.
It is part of his web page,
Learning in Games.
The first computer program to achieve a level of play close to the best
human players was TD-Gammon, written by Gerald Tesauro.
This program is a neural network that is able to teach itself to play
backgammon by playing against itself and learning from the results.
Tesauro describes TD-Gammon in an article originally published in
the March 1995 issue of Communications of the ACM, entitled
Learning and TD-Gammon.
Last updated: September 2004.
: Backgammon Galore
: Motif Backgammon