This article originally appeared in the October 2000 issue of GammOnLine.|
Thank you to Kit Woolsey for his kind permission to reproduce it here.
The emergence of bots which play a competent game has led to a great increase
in our understanding of backgammon. Theories which were previously just
guesses could be put to the test by having the bots roll out positions and
plays. Old concepts were discarded, new concepts were formed, and the general
level of skill in backgammon improved. The bots are definitely a very
valuable learning tool. However they can still give us false or misleading
information if we are not careful. The purpose of this article is to
examine how we can make the best use of the bots in order to improve our
I want to stress that this is one area where my expertise is limited. I do not have a working knowledge of neural networks, nor do I understand exactly how the bots are programmed in many cases. A lot of what I will be saying in this article will be conclusions based on what I have observed, and some of these conclusions may be quite wrong. I encourage knowledgable readers who are more familiar with these areas to send in articles themselves correcting my mistakes or adding to the concepts I am discussing. In this way I am hoping this article will lead to a collection of articles by the best computer and backgammon minds in the world, which will be of the most value to everybody.
There are several bots which play a competent game. The two commercial ones are Jellyfish and Snowie. I am going to limit this discussion to use of Snowie for one main reason: Snowie has the ability to import a match (with different formats) and then automatically run through the match, giving its evaluation of all the plays. This is an extremely valuable tool for improving one's game. The question is, how can we make the most of it.
I am using Snowie version 3.2, on a fairly fast computer. Earlier versions of Snowie may have features different from what I am describing. Slower computers will take more time, making some of the rollout suggestions too time-consuming.
First, a little background. The idea of a neural network learning to play backgammon was originally conceived of by Gerald Tesauro around 1991. His program, TD-Gammon, was the pioneer for the neural nets. The way the neural nets work is roughly as follows: Starting with little more than the rules of the game, they play thousands of games against themselves. From the results of these games, they "learn" through trial and error the values of important paramaters such as points, blots, etc. and construct their own equations for weighing these paramaters for any given position. At the conclusion of this training period, the bot is able to look at any given position and give an estimate of the number of wins, gammons, and backgammons which each side will win. This estimate is what the bot uses for playing.
How good is this estimate? It depends a lot upon the type of position. Oddly enough, for simple positions such as races the bots make strange misevaluations and thus do weird things. In order to get around this, I believe that Snowie uses a data base (which is 100% accurate) to make its racing plays when both sides have all their men in. Other relatively simple positions such as coming in against an anchor cause the bot to do some strange-looking things, although a surprising number of these plays turn out to be correct upon analysis. Other types of positions can give the bots problems. Priming battles where timing is critical can cause the bots to be way off. Back games, which require special technique, create problems for the bots. However in the average position the bots are generally correct in their assessments.
This snapshot evaluation is what is called the 1-ply evaluation. There is no looking ahead by the bot, just a weighing of paramaters to evaluate the position. If the bot can look ahead a bit, this evaluation improves markedly. Just examining all the possible dice rolls for the opponent, which is the 2-ply evaluation, will have the evaluations much more accurate. If the bot looks further down the road at not only the opponent's rolls but its own rolls, the 3-ply evaluation, its accuracy is better still. The problem with these 2 and 3-ply evaluations is time. There are 21 different possible dice rolls (grouping 4-3 and 3-4 as the same). Thus to do a full 3-ply evaluation the bot has to examine 21 × 21 or 441 sequences. It then has to find the best plays for each of these sequences (using its snapshot or 1-ply evaluation). Even with our ever-faster computers, this can take some time. However with a fast computer the bot can play at 3-ply in what would be considered a normal playing pace, and it is then a formidable opponent.
So how well do the bots really play at their 3-ply analysis? The answer is very wellprobably as good or better than the best players in the world. They may slip up on some of the technical plays, but they more than compensate for this with their judgment in the murky but critical positional decisions and balance of priorities.
A true test of this came a few years ago. Jellyfish (backed by Malcolm Davis) played 300 games against Mike Senkiewicz and 300 games against Nack Ballard. The stakes were significant, so the players were taking this very seriously. The conditions were simulated to actual playdice were rolled by hand, the players were playing on a regular backgammon board, a human opponent was playing Jellyfish's moves. The moves and dice rolls were keyed into the computer by a third person, and that person called out the Jellyfish plays which were then made by the human opponent. Thus, from the player's point of view it was a regular backgammon game. It worked quite smoothlyI was there and helped set it up and do much of the inputting. The results were Jellyfish came out dead even +58 points vs. Senkiewicz and −58 points vs. Ballard. Since if anybody claimed that Senkiewicz and Ballard were the two best players in the world it would be difficult to dispute that, I believe this match was pretty solid proof that Jellyfish can play at world class level. Snowie plays at least as well as Jellyfish in my opinion, so we are talking about top level play.
The really nice feature on Snowie is the ability to have it run through an entire match, evaluating your play. Using the File/Import File button, you can import an match into Snowie. There are several different formats which can be accepted. For example, if you play a match on GamesGrid, you can then save it in .sgg format, and can import this match directly into Snowie.
Now to have Snowie run through it. If you click on Batch/Specify Analysis, a dialog box comes up which allows you to prepare the runthrough. In order to get the most out of Snowie's analytical capabilities, I recommend using the 3-ply analysis for the match, search space huge, 3-ply speed 100%. I'm not exactly sure what is going on here, but I believe these paramaters control how the 3-ply analysis is done. If you use a smaller search space, the analysis is shortcutted in some way, so it won't be as accurate. The tradeoff of course is that it goes faster. On my computer the analysis of a 7-point match generally takes about 10 or 20 minutes, which I find quite acceptable. I can play a match, run the analysis, and play through it to see where I screwed up. You can also set the program to automatically roll out what it considers an error or a blunderthis takes some extra time of course. Fiddle around with the paramaters and find what best suits you.
Now the hard partinterpreting what Snowie is saying. First of all, how does Snowie handle things? Obviously it would be silly for it to run a full 3-ply analysis on every legal move, since some of the moves are ridiculous. What it does, I believe, is as follows: It checks out every legal move on the 1-ply (this it can do very quickly). Those moves which are within some amount of the "best" play, are then run on a 2-ply. After that second screening, candidates which are still close are run on the 3-ply. That is why you will see some candidates on 3-ply, some on 2-ply, and some on 1-ply. And if the move is 100% obvious the 3-ply or even the 2-ply may be skipped altogether. I don't know exactly what paramaters are used to determing when to include a candidate in the next pass, but for the most part the screening seems to be accurate. It is rare that a serious candidate doesn't make it to the 3-ply stage. And of course if when going through the match you are suspicious about a move which didn't make it to the 3-ply stage you can just do a 3-ply check on it very easily.
Okay, so now we have a bunch of information on each of the candidate moves for a position. What does all this info tell us? The move at the top is the move Snowie thought was the best. Other moves are inferior according to Snowie, and the numbers will hopefully tell us how much inferior they are. The move actually played will have an asterisk next to it.
For each candidate move, there are six numbers at the bottom. These, in order, tell us Snowie's estimates (in percentages) of backgammons won, gammons won, single games won, single games lost, gammons lost, and backgammons lost. It should be noted that the single games won column includes gammons and backgammons, and the gammons column includes backgammons. Thus, the single games won and the single games lost will always add up to 100%. These are very important results to keep an eye on, as they will be telling the true story of what Snowie is thinking.
To the right of the move is a number which is some kind of measure of the equity of the move. What does this number mean? I'll admit I don't understand it, and I would be very appreciative of a good explanation. Of course this is the number by which the candidate moves are ranked, and the number which Snowie uses to rank our overall play in the match. While it might not be clear what it means, the difference between that number for the best play and the number for our actual play is supposed to be some kind of measure of the magnitude of our error.
A third piece of information can be gotten by simply pointing the cursor inside the box where the move is. A yellow popup box comes up giving us what it calls score based cubeless equitiescurrent, doubled, and redoubled. Once again, it is not clear to me exactly what these figures stand for at match playfor money, it is just the normal cubeless equity. However the difference between the current figure for the best play and the current figure for the actual play signifies something. Interestingly enough, the best play doesn't always come out on top. Go figure.
It would seem as though the magnitude of the difference between the best play and the actual play (which is expressed by a number in parenthesis) should be a good measure of how bad the play is. Unfortunately the makers of Snowie are trying to figure in the cube, and by doing so the results can be quite distorted. For example:
Blue has just rolled a pretty bad number from the bar. Let's suppose that we thought the priority was to play as safely as possible, and we had played B/23, 8/2. As we will see, this is not a good play. But we want to know more than thatwe want to get a feel for how bad a play we have made so that when we see a similar position (but for which the arguments for the safe play are stronger, say one of the checkers on the midpoint moved down to the eight point). In other words, when we see an error we have made we need to know how bad the error was so that we have mentally built up a reference position for the future.
The Snowie results at full 3 play, 100% are as follows:
B/23 13/7 -.613 0.2% 7.0% 37.3% 62.7% 25.3% 0.8% Cubeless equities current: -0.254 - Jacoby doubled: -0.443 B/23 8/2 -.764 (-0.151) 0.3% 6.8% 36.1% 63.9% 29.5% 1.2% Cubeless equities current: -0.278 (-0.024) - Jacoby doubled: -0.514 (-0.071)As those of you who use Snowie know, a difference of −.151 represents a monstrous blunder. One would conclude that the concept of playing safe in this sort of position was way off, and that even with similar positions which would argue more for the safe play it wouldn't be correct.
But wait. As an experiment, let's look at exactly the same position but give Blue ownership of the cube. Suddenly we have a different story! Snowie says:
B/23 13/7 -.307 0.2% 7.0% 37.3% 62.7% 25.3% 0.8% Cubeless equities current: -0.443 B/23 8/2 -.382 (-0.075) 0.3% 6.8% 36.1% 63.9% 29.5% 1.2% Cubeless equities current: -0.514 (-0.071)What is going on? All the win, loss, and gammon percentages are the same. However with the big number, the difference is now only −.075. Suddenly our whopper of an error has been cut in half, and is now only a fairly serious error. How can this be?
What it looks like is going on is as follows: Clearly White has a strong position after either play, and the cubeless equities of −.443 and −.514 indicate that after either play the proper cube action (if the cube is in the center) is double and take. So Snowie, in its infinite wisdom. doubles everything because it anticipates a cube turn and is trying to churn out numbers based on the cube. If you weren't aware of this, you would have a very distorted picture of how large the error actually was.
When we get to match situations, these distortions can be even worse. As an illustation, I'll give you the "size of error" for the above play at each match score in a 5-point match. You can try to figure out for yourself what is going on. However if you were just trying to analyze your checker play, you can come up with some very wrong impressions.
In each case the cube will be assumed to be in the center, and if one player has 4 points it will be assumed to be the Crawford game. Blue's score will be given first.
Score Error 0-0 .165 1-0 .196 2-0 .184 3-0 .090 4-0 .030 0-1 .194 1-1 .171 2-1 .000 3-1 .000 4-1 .069 0-2 .073 1-2 .095 2-2 .175 3-2 .000 4-2 .033 0-3 .085 1-3 .086 2-3 .095 3-3 .064 4-3 .041 0-4 .023 1-4 .027 2-4 .023 3-4 .029 4-4 .024These numbers make sense if you think about them. The numbers around .180 are at match scores where Snowie thinks it will be double and take after the play. The .000 numbers of course are when Snowie thinks it will be double and pass, since then it doesn't matter what you do. The numbers around .080 are at match scores where Snowie doesn't think it is a double. And the numbers around .030 are at scores where getting gammoned doesn't matter.
The problem is that when you are zipping through a match examining your blunders and errors, you aren't thinking about the cube magnification effect. You just want a quick look at the number to see what is going on. Until a future version of Snowie changes this and puts things in the proper perspective, the user has to be very aware of this situation.
A slight modification in the position illustrates how serious this problem can be:
Once again, B/23, 13/7 is the correct play. This time, however, the difference between that and B/23, 8/2 is only .083. It isn't that the two men on the 11 point structurally change much. The difference is that due to the improvement in Blue's position Snowie doesn't think White has a double after either play, so the difference isn't magnified.
One more minor modifaction:
Now the difference jumps all the way to .196! What happened was that we weakened Blue's position just enough so that Snowie thinks White has a double after the poorer B/23, 8/2, but not after B/23, 13/7. This assessment may well be correct, but the results magnify the size of the error way out of proportion.
I believe the above distortions only occur when a cube turn is imminent, so they won't happen too often. However they can be very distracting when they do occur.
Snowie attempts to take the match score into account when evaluating various candidate plays. If some plays are much more gammonish than others, this can lead to weird-looking results which may be difficult to interpret properly. The following position is a typical example:
Blue's obvious choices are to attack with 8/4(2), 6/2(2)* or to play positionally with 24/20(2), 13/9(2). Both plays leave him with a strong position. The attacking play will generate far more gammons, and will also lose a few more gammons. The positional play figures to win the game more often. For money (assuming no Jacoby rule), Snowie's 3-ply opinion is:
8/4(2) 6/2(2)* +.588 1.2% 25.6% 61.5% 38.5% 9.7% 0.4% Cubeless equity: +.397 24/20(2) 13/9(2) +.536 (-0.052) 0.6% 15.9% 63.8% 36.2% 5.7% 0.2% Cubeless equity: +.381 (-0.016)On the cubeless equity, the plays appear to be very close. As expected the pure play wins a bit more, while the attacking play gets a lot more gammons. The attacking play comes out a fair amount better on the Snowie evaluation. The may have something to do with the expected cube action. If White flunks, Snowie thinks that Blue will have a very efficient cube (equity of +.570), which makes the attacking play quite attractive. Of course the positional play may well lead to an efficient cube a few moves down the road, but Snowie's 3-ply can't see that far. Thus, Snowie concludes (rightly or wrongly) that the attacking play is considerably superior.
Now let's look at what Snowie thinks of this play at a couple of different match scores in a 5-point match:
0-0: The attacking play is superior by .078
Can these figures be right? True one should go more for gammonish plays when behind in the match, but can the difference between 1-0 and 0-1 in a five point match with the cube still in the center be this great? Snowie thinks the plays are a photo when ahead 1-0, but considers the pure play a fairly major blunder when behind 1-0.
What is the reason for these big differences? I can't figure it out. In Snowie's opinion if Blue attacks and White flunks, it is double and pass at each of these three match scores. Snowie thinks the pass is closest at the 0-0 score compared to the other scores, but not by that much. It doesn't make a whole lot of sense to me.
Let's try following the same theme, but strengthening White's position a bit:
For money Snowie rates the two plays as follows:
8/4(2) 6/2(2)* +.434 1.0% 22.7% 59.0% 41.0% 10.6% 0.4% Cubeless equity: .306 24/20(2) 13/9(2) +.431 (-.003) 0.5% 13.8% 61.7% 38.3% 6.0% 0.2% Cubeless equity: .313 (+0.007)Now Snowie rates the plays as a photo. This makes sense. With White having advanced the two men to the eight point, the attacking play won't pick up as many gammons. Also making the anchor for defense becomes more important. Incidentally, if Blue makes the attacking play and White flunks, Snowie now rates it as a borderline double for money (and of course a trivial take).
In a 5-point match, Snowie thinks a little differently:
0-0: Attacking play better by .044
This seems odd. Why should the attacking play be so much better at 0-0 in a five-point match than it is for money? Part of the reason might be that in the match Snowie thinks the double is clearer (rather than borderline) if White flunks after the attacking play, although the take is still very clear.
I took a look at the figures assuming that White held the cube. The wins, gammons, backgammons, and cubeless equity remain the same, of course. However, for money Snowie now rates the pure play as superior by .018. It looks like Snowie is taking into account the cube potential after the attacking play. Whether this is the proper approach is questionable. The problem, I think, is that Snowie is only looking at the cube potential on its next roll.
For the match scores we have been considering (and White owning the cube), Snowie says:
0-0: Attacking play better by .002
This makes some sense, since the player behind in the match can use gammons more than the leader. However the differences are pretty small, so the match score doesn't appear to be too relevant a factor. The larger differences in the play evaluation appear to come when a cube turn is possible in some variations.
As a final check, let's strenghten White's position a bit more so that there won't be a cube involved next turn.
Snowie's money estimates are:
24/20(2) 13/9(2) 0.405 0.5% 14/6% 60.4% 39.6% 6.1% 0.2% Cubeless equity: .296 8/4(2) 6/2(2)* 0.273 (-0.126) 0.6% 15/0% 56.8% 43.2% 6.9% 0.3% Cubeless equity: .220 (-0.077)Now Snowie clearly prefers the pure play, which obviously makes sense with White having the stronger offense. Also, Snowie thinks that if Blue makes the attacking play and White flunks, Blue definitely does not have a double.
For the match scores, we have the following results:
0-0: Pure play better by .103
In none of these cases does Snowie think that after attack and flunk does Blue have a double (although it is close). Thus, the differences in the Snowie estimates of the plays doesn't change much. I don't understand why Snowie prefers the pure play more when behind 1-0 than at the other scoresthat may have something to do with the quirks of the match equity table. Anyway, the difference isn't much. The key appears to be that when no cube turn is coming on the next roll, the results will be pretty much consistent regardless of the match score. However if there is a potential cube turn on the next roll, then the results of play vs. play evaluations may be quite distorted from reality. This is important to remember when running through a match.
When Snowie gives its cube analysis, there is plenty of information available to the user. The full wins, gammons, and backgammons to each side of course, as well as the cubeless equity. In addition, Snowie presents three numbers. These are the equities if it goes double-take, if it goes double-pass, and if it goes no double. The equity for double-pass is always 1.000, of course. The other equities take the match score into accounthow this is done is not totally obvious to me. However Snowie seems to do a pretty good job on this. If the equity after double-take is less then 1.000, the double should be takenif it is greater, the double should be passed. If the equity after no double is greater than after double take, then doubling is wrongif the equity after double-take is greater then doubling is correct (except if the equity after no double is greater than 1.000then the position is too good to double). For example, look at the following positionBlue is ahead 2-1 in a 5-point match:
The Snowie estimates are:
Money equity: .483 0.1% 2.0% 73.9% 26.1% 1.5% 0.1% Double, take: .986 No Double: .888 (-0.097) Double, pass: 1.000 (+0.014)According to these estimates, double-take is the proper cube action. If White passes, he costs himself .014 in equity. If Blue fails to double, he costs himself .097 in equity. I'm not exactly sure what these numbers mean, but relatively speaking they are probably some kind of decent indication of how serious a cube error is. Also, given the estimated money equity and the various percentages, the final conclusions look somewhat reasonable. When gammons come into play things get a lot tricker as far as cube action goes in matches, but from what I have seen Snowie seems to handle these problems pretty well.
Of course, Snowie's cube actions have to be dependent upon Snowie's evaluation of the position. If Snowie is way off in the evaluations, then the cube actions will be correspondingly wrong. In positions such as the above it is likely that the estimates are on target. However for certain types of games, Snowie's estimates can be far off. In general, Snowie (like the other bots) has trouble evaluating timing positions and back games. The results are more serious for the user than with play decisions. Even if Snowie is misestimating the overall equity for a given type of position, the difference between two plays is likely to be accurate as long as the position type resulting from the two plays isn't too different. When it comes to cube action, it is necessary to get the absolute equity correct in order to make a proper decision.
Here is an example of a sequence of plays from a match I played recently which illustrates the difficulties Snowie has:
White is on roll. At the time I thought my opponent should have redoubled. He was a favorite to make his four point, and then the timing would probably go his way. If he made his four point and I busted my prime, he would lose his market by quite a lot. Of course it looked like I still had a take, since he might roll a horror number or I might survive the priming battle even if he does make his four point. Snowie said:
1-ply Money equity -0.007 0.5% 10.9% 48.1% 51.9% 8.1% 0.2% No redouble 0.189 Redouble, take -0.197 (-0.387) Redouble, pass 1.000 (+0.811)What is this? Snowie thinks that the game is pick-em, and that redoubling would be a gross error. Can this be? Was my judgment that far off base? But wait! What's this 1-ply stuff. Clearly what happened was that Snowie first evaluated the cube decision on a 1-ply, and since it came out extremely clear on that 1-ply Snowie didn't bother looking any further. Well, let's look a little further. How about 2-ply:
2-ply Money equity 0.228 0.4% 10.6% 57.8% 42.2% 3.8% 0.1% No redouble 0.458 Redouble, take 0.292 (-0.166) Redouble, pass 1.000 (+0.542)Well, this looks a little more like it. At least on the 2-ply Snowie recognizes that White is a moderate favorite. Apparently Snowie just didn't "look ahead" on its 1-ply sufficiently to see the timing advantage White would have after making the prime. This is a good illustration of the problems Snowie can have with priming and timing battles. Well, let's look another ply:
3-ply Money Equity 0.352 0.7% 16.1% 61.5% 38.5% 4.4% 0.1% No redouble 0.616 Redouble, take 0.587 (-0.028) Redouble, pass 1.000 (+0.384)This is looking much more reasonable. Now Snowie recognizes that White is a clear favorite. It still thinks a redouble would be wrong, but that it is a very close decision. Obviously by looking ahead to see what is likely to happen on the next couple of rolls Snowie can "see" how the timing is likely to go White's way.
How about rolling the position out? I will discuss the bot rollouts later in this article in detail, but for now accept that a 2-ply rollout of 72 trials with no truncation, which is what I did, is likely to give a pretty accurate result. And the rollout said:
Rollout Money equity 0.405 1.1% 21.5% 62.9% 37.1% 7.9% 0.2% Redouble, take 0.757 No redoube 0.682 (-0.075) Redouble, pass 1.000 (+0.243)Now that's more like it. According to the rollout, White does in fact have a quite proper redouble (and a very easy take for Blue). Just about what I had thought. However had I not been suspicious of the original results I would have simply thought that I had mis-estimated the position badly and been left with the wrong impression. This shows that Showie's screening powers to determine how far to search can occasionally come up with a very bad result.
The game continued as follows: My opponent did not redouble, rolled 6-5, and played 10/4, 9/4. I rolled 5-2 and played 8/3*, 5/3. This was the position:
Now what is going on? I was sure my opponent was supposed to redouble. All he had to do was flunk and my position would probably crack, and even if he entered the priming battle would probably be in his favor. Frankly I wasn't sure whether or not I was supposed to take. This time Snowie was willing to produce a 3-ply result, which said:
3-ply Money equity 0.375 1.0% 19.6% 60.7% 39.3% 4.4% 0.2% No redouble 0.656 Redouble, take 0.624 (-0.032) Redouble, pass 1.000 (+0.344)This wasn't near my estimate. I thought the redouble was clear, and it was the take which was the question. On its 3-ply, Snowie thought the redouble was a photo (and the take very clear)in fact, Snowie opted not to redouble. Perhaps our good friend the rollout (same paramaters as before) would shed some light on what is really going on.
Rollout Money equity 0.588 1.4% 27.9% 67.7% 32.3% 5.8% 0.1% Redouble, pass 1.000 No redouble 0.838 (-0.162) Redouble, take 1.138 (+0.138)This looks quite different. According to the rollout not only is it a clear redouble but I have a solid pass (for money it would have been closer). Not redoubling is a big blunder, and a blunder Snowie would have made. The bots play very well, but they are far from perfect.
My opponent chose not to redouble and rolled 5-2, playing B/23, 6/1*. I responded with 1-1, playing B/24*, 24/23, 9/8(2). Here we were again. This time it was quite clear that he didn't have a redouble. If he entered he would probably be the one to crack, and if he flunked I would still have a reasonable chance to win the priming battle with my spare on the eight point able to absorb some pips. Snowie's opinion was that I was the slight favorite. This may or may not be accurate, but it is definitely not a redouble for him. No need to roll this one out.
He now danced. I rolled 6-2, and played 8/6, 8/2*. New cube problem. This is what the position now looked like.
My reaction was that this is a monster double. I wouldn't take it in a million years. The timing was now very likely to go against me. Snowie didn't see it that way, however.
3-ply Money equity 0.436 0.5% 12.6% 68.5% 31.5% 6.0% 0.4% Redouble, take 0.846 No redouble 0.787 (-0.058) Redouble, pass 1.000 (+0.154)According to Snowie on the 3-ply both the redouble and the take were clear, with the take being clearer than the redouble. Once again, a rollout told a different story.
Rollout Money equity 0.689 1.0% 22.3% 76.1% 23.9% 6.2% 0.4% No redouble 1.006 Redouble, pass 1.000 (-0.006) Redouble, take 1.431 (+0.425)Not only is this a monster pass, it might even be worth playing on for the gammon. Not correct to play on against Snowie, however, since he would have taken! That take would have been a colossal blunder.
Predictibaly enough, my opponent didn't redouble. This may have been the theoretically correct action, but from his previous cube action it is likely he thought he wasn't good enough rather than thinking he was too good. The game continued with him rolling 6-2, playing the forced B/23*. I rolled 3-1, playing B/24, 8/5. The position now was:
Now he finally redoubled (indicating that his reason for not redoubling the roll before was the wrong reason), and I passed of course. On the 3-ply Snowie finally recognized that his position was pretty strong.
3-ply Money equity 0.530 0.9% 20.8% 70.0% 30.0% 8.2% 0.4% Redouble, pass 1.000 No redouble 0.783 (-0.217) Redouble, take 1.086 (+0.086)Of course, Snowie only would have passed because of the match score. The money equity of .530 indicates that Snowie would have taken for money. A rollout gave a more realistic appraisal:
Rollout Money equity 0.651 1.5% 23.0% 73.7% 26.3% 6.6% 0.3% Redouble, pass 1.000 No redouble 0.874 (-0.126) Redouble, take 1.326 (+0.326)Not surprisingly, a huge pass.
So we have seen how Snowie even on its powerful 3-ply can make very large misevaluations on cube decisions in some types of positions. In order to get closer to the real truth, we need to go to rollouts.
Having the bot roll out the position is an ideal way to see what is going on. Unfortunately there are two drawbacks to rollouts. One is that they are time-consuming. The other is that the bot may not handle the position well enough for the rollout results to be meaningful.
How many times do we need to roll out a position before we can trust the results from a statistical point of view. Assuming we are rolling out the position all the way to the end, it can take quite a few trials. It might seem like 1000 or so trials would be sufficient, but the truth is that there can be quite a bit of variance (or luck) involved, and it is not uncommon for a rollout result to be several percent away from what it should be on 1000 trials. You need 4000 or 5000 trials to be fairly safe on statistical grounds, and this takes a fair amount of time even with today's fast computers.
One way around the time problem is to use truncated rollouts (or what are called mini-rollouts by Snowie). The idea is that instead of having the bot play the position to the end, you have it play it out a few moves, and then simply use its equity estimate after those few moves as the result of the trial. This is a time-saver for two reasons. First of all, it takes much less time to play out a few moves than to complete the whole game. Secondly, a smaller number of trials is generally sufficient. The reason is that the luck factor is cut down, because you don't have the wild one-roll swings which occur at the end of the game. The program has already averaged them out when forming its estimate. Thus, 1296 trials is usually quite sufficient. The default setting on Snowie is 5 rolls deep, but my personal preference is 7 rolls. I believe it is worth a little extra computer time in order to get a bit farther into the position.
Truncated rollouts are dependent, of course, on Snowie's ability to estimate the equity of the resulting positions accurately. For most simple positions Snowie does a pretty good job on this. However for some complex backgame types of positions which won't change much in nature over the next few rolls, Snowie's estimate may be far off. For these types of positions, truncated rollouts may not be the answer.
The other problem with the rollouts is that Snowie might not be playing too well. Due to time considerations, we can't have Snowie play at its powerful 3-ply level for the rollouts. They will just take too long. We have to live with the 1-ply level of play if we are going to get a sufficient number of trials in. As we have seen, Snowie can come up with some pretty bad plays on its 1-ply level. If these bad plays occur at the wrong time, they may mess up the rollout results.
The answer to this problem is to have Snowie play at a higher level for the rollouts. It may seem as though this requires too small a number of trials for the results to be meaningful, but it turns out that this is not the case. By using a method called variance reduction, Snowie can make the results of about 50 trials equivalent to about 1000 trials. The general idea is that Snowie evaluates the luck factor from each roll in the trial and filters it in. If Snowie's evaluation of this luck factor is accurate the number of trials represented will be far more than the actual number rolled out.
How much can we trust this variance reduction method? David Montgomery wrote a truly excellent article on variance reduction in the February 2000 issue of GammOnLine. I fully recommend this article to anybody who is interested in this sort of material. I'll admit I don't understand the article or variance reduction fully, but it does seem to make sense. Mathematicians and other knowledgable people I have talked to about it all say that the approach looks valid. And, most importantly, from what I have seen the results appear to be very reasonable. When I roll a position out 1-ply (no variance reduction) and 2-ply (with variance reduction), if the results differ the 2-ply results are almost always more in line with my intuition than the 1-ply results. Thus, I have concluded that 2-ply rollouts using variance reduction are pretty valid.
What setting should you use for the 2-ply, and how many trials? This is largly a matter of time and computer power. I generally run 72 trials, played to completion. I would rather not get involved with truncated rollouts with the variance reduction, since then I have to worry about the Snowie evaluations. It appears as though 72 trials is usually roughly equivalent to 1296 trials without variance reduction. For most positions it takes from 2 to 10 minutes to roll out those 72 trials on my computer. This time will vary depending on one's computer speed, of course.
When should you use 2-ply and when should you be satisfied with the 1-ply rollouts? You just have to get a feel for this. Simple positions such as holding games, blitzes, and races are generally handled okay by the 1-ply rollout. More complex games usually need the 2-ply rollout to get close to the truth. After having tried several different types of positions you get to know pretty well what you should be using.
The advantage of the 2-ply is that the bot plays considerably better than it does on 1-ply, making the rollout results more trustworthy from that point of view. So how about using 3-ply rollouts? This doesn't gain you anything as far as the variance reduction goes, it turns out. If 72 trials on the 2-ply are about equivalent to 1296 on 1-ply, then 72 trials on 3-ply will also be about equivalent to 1296 on 1-ply. The gain is simply that the bot plays better on 3-ply than on 2-ply. The loss is that it takes a lot longer.
How about cubeful rollouts? The rollouts can be set so that Snowie takes the cube into account during the rollout. My personal preference is to not get involved with these when I am examining a play vs. play problem which I appear to have bungled. There are enough things which might go wrong with the rollout without getting Snowie's cube opinions involved in its evalutions. If I get a look at the cubeless equities, these will generally be sufficient for me to form my own conclusions about the cube considerations if need be.
So, what procedures should one follow to use the bots for improvement? This is largely a matter of one's personal taste. This is what I do. First I will play a match, say on GamesGrid. After completing the match, save the game on my computer (in .sgg format), and then import the file to Snowie. I then have Snowie run through the match. My settings are 3-ply with huge search space and 100% 3-ply speed. With these settings, it generally takes Snowie about 10 to 20 minutes to run through a 7 point match. I have it look at both my opponent's plays and mineyou never know when something of interest might have come up on the other side of the board.
Now for the analysis. I don't just look at my errors and blunders, although obviously these are the most important. I go through every move step by step. It is just as important to see the difficult positions which you have handled correctly as the ones you got wrong. It is also important to note by how much Snowie prefers move A to move B. If you thought the choice was very close but Snowie says they are miles apart, then you have some rethinking to do about the position.
What about my errors or blunders. Generally I will roll them out, unless I can agree immediately that I was wrong. It is the results of these rollouts which give me the most insight into the position, and tell me where I need to improve my thinking. Also I will roll out some very pivotal plays which I didn't really know what to do even though Snowie may have thought I got it right. In addition I may roll out a play of my opponent if I am surprised by the Snowie result. Always keep in mind that it isn't just which play Snowie thinks is best that is importantit is the magnitude of the difference between the two plays which matters. If Snowie thinks your play is .020 worse than the best play you can probably ignore its opinion if you still think your play is betterquite likely it is. However if Snowie believes your play is .150 worse than the best play you can be pretty sure that your play is wrong, and it is up to you to figure out why.
In addition to the above, it is vital to keep an eye open for the types of quirks described earlier in this article which may magnify an error out of proportion due to the match score or potential cube action. Always be willing to look at the cubeless equities and at the percentage of wins and gammons for each side. These figures don't lie, and will often give you a truer picture of what is going on than the number Snowie uses to rank the plays and evaluate the size of your errors. And of course if you are still suspicious, roll it out2 ply if necessary. If the rollout confirms what Snowie is saying, you can be pretty sure that it is right.
One further thing. While it in our nature to think that we are right and Snowie is wrong, most of the time when Snowie says we make a blunder we have made one. It is true that occasionally Snowie will give us false information, as has been seen from several of the examples. But when in doubt, believe Snowie and try to understand what it is trying to tell you. Snowie usually knows what it is doing.