This article originally appeared in the February 2000 issue of GammOnLine. Thank you to Kit Woolsey for his kind permission to reproduce it here. |
1. HistoryIn early 1996 Fredrik Dahl and Effect Software AS released JellyFish 2.0. One of its features was the ability to roll out positions with "variance reduction." This feature allowed it to make short rollouts as accurate as much longer rollouts. Using this feature a rollout of a hundred games might well be just as accurate as a rollout of 2500 games done without the feature.This was a breakthrough in backgammon technology. With variance reduction it became feasible to roll positions out using a program's higher levels, like JellyFish level 6 or Snowie 2- or 3-ply. On these higher levels the programs think longer about each move, playing slower and better. But the loss of speed is mostly compensated for by the increased accuracy of variance reduced rollouts. Variance reduction also made it practical—for the first time—to get trustworthy results from human rollouts. JellyFish's interactive rollout feature allowed a user to play either or both sides of a position in a rollout, with the program then analyzing the games to calculate an equity far more accurate than the average result. Although players had long been rolling positions out by hand, the reliability of these rollouts was very poor. It simply takes too many games to get an accurate result. This was not fully appreciated. Throughout the backgammon literature writers describe rollouts and propositions of—gasp—a hundred games. Playing them, this is a lot of games. But a hundred games aren't much if we are hoping that the results will tell us the true value of a position. When variance reduction makes those hundred games worth 2500—now we have something. Now Snowie 3 is out, and for the first time we have variance reduction for cubeful rollouts. This is another breakthrough in backgammon technology. Our ability to do meaningful cubeful analysis skyrockets with this new tool. But shockingly it appears that many users are still doing rollouts as though it were 1995. Because of a distrust of variance reduction, they stick with rollouts performed by weaker players (JellyFish level 5 and Snowie 1-ply) and don't bother to do their manual rollouts with JellyFish's interactive rollout feature. This distrust stems from a lack of understanding of how variance reduction works. Fredrik Dahl deserves great credit for bringing variance reduction to backgammon. But Fredrik is also partly responsible for the mistrust people have of variance reduction. When JellyFish 2.0 was released, Fredrik wanted to keep the exact formulation of his algorithm secret, for competitive advantage. Thus he couldn't tell us exactly what he was doing. On the other hand, he must certainly have wanted buyers to know what a great and wondrous new feature this was (it was great and wondrous), so he couldn't just report what we are now used to calling the "equivalent" games as the number of games rolled out. (If he had done this, it would probably have prevented a lot of misunderstanding.) Instead, the algorithm was described in general terms. The JellyFish 2.0 User Manual says that the algorithm "can be interpreted as compensating for the luck of the rolls." This hint was not enough to make the procedure clear, but just enough that people mistrusted the algorithm. Compensating for the luck must rely on programs' evaluations, and as Rob Maier points out, "aren't we doing the rollouts because we don't really trust their evaluations?" Variance reduction is no longer a secret. Many people have independently worked out how the algorithm works. Descriptions and variations of the basic idea have been posted to the backgammon newsgroup. Snowie has had it from its first release. It's time for variation reduction to be described plainly. 2. Variance Reduction ProcedureLet's work through an example to see what variance reduction actually does:
The position is simple so that we can easily walk through several complete rollouts and so that we can check our rollouts against the exact answer. Blue's exact winning chances are 1/6 (when he rolls doubles) + 5/6 × 1/4 (when he doesn't and White fails to bear off) = 9/24 = 37.5%. We'll roll this position out using variance reduction, doing all our calculations in terms of Blue's winning percentage. Each trial the game is played out to the end, just as in a normal rollout, and we note the final result. In addition, for each roll of the dice we estimate how lucky Blue was, and we accumulate these estimates over the whole game. When the game is finished we'll subtract this accumulated luck from the game result to "cancel out" Blue's luck, and this will give our variance reduced result for the game. We'll discuss how a computer program estimates the luck later. But for now we'll use the luck adjustments in the table below. The numbers in the "Quantified" luck column aren't the "right" values—I just made them up. Later on we'll see that the exact adjustments aren't particularly important.
When we say that Blue has been lucky, we mean that his chances are higher after the roll than they were before. Conversely with unlucky rolls: Blue would prefer to roll over. Each turn is considered independently for deciding whether a roll is lucky or not. Here, whenever White gets to roll she has already had a little good fortune (Blue didn't roll doubles), but we don't consider the past when estimating the luck for a particular turn. With our luck estimates in hand we're ready to do a variance reduced rollout. Game 1Blue rolls a non-double and takes two checkers off. Looking at the table above we see that Blue was "marginally unlucky", with an estimated luck value of −10%. Then White fails to bear off her last checker, which is +45% lucky for Blue. The final result is that Blue has 100% chances, since Blue has won the game. Blue's accumulated luck is (−10) + (+45) = +35%. We subtract Blue's luck from the final result to get a variance reduced result of 100 − 35 = 65%. Game 2Blue rolls a non-double, taking two checkers off and getting a luck value of −10%. White bears off her last checker, with a luck value of −15%. The game result is 0%—White has won the game. The total luck is (−10) + (−15) = −25%. The variance reduced result is 0 − (−25) = 25%. Game 3Blue rolls a double and wins the game. Blue's luck value was +50%. The game result is 100%. The variance reduced result is 100 − 50 = 50%. These games and seven more are summarized in the table below. All figures are percentages from Blue's perspective.
After ten games, Blue has won five, so that the result of the unadjusted rollout is that Blue wins 50%. The variance reduced results give Blue 4.2 games out of ten, for 42%. Blue's true chances are 37.5%, so the variance reduced result is closer to the true value. This is generally the case with variance reduction. How a Program Estimates LuckTo estimate the luck for a particular roll, every turn computer programs actually evaluate the positions that arise after playing each of the 36 rolls. The average of these values is the value of the position before rolling. And the difference between the equity after the roll and the equity before rolling is the estimated luck for that roll. Let's work through this for our example, starting with the second roll.
The program actually goes through all 36 rolls and plays them out on the board. After 3/4 of the rolls the program sees that Blue has lost the game, so Blue has 0% chances. After the other 1/4 of the rolls the program sees that Blue has the game won, so Blue has 100% chances. Putting all the rolls together the program calculates that before the roll Blue has 25% chances. Thus, when Blue loses, the luck of the dice have cost him 25%, but when Blue wins the dice have given him 75% he didn't have before. Similarly, for the first roll:
The luck estimates in these two tables are the correct values, and any computer program with even a tiny racing database should use these numbers. Procedure SummaryThe program plays out each game just as in a normal rollout. Each turn it considers the position after all 36 rolls and calculates the average value of the possible continuations. The program estimates the luck for a particular roll as the difference between the evaluation after the roll and the average evaluation before the roll. The program totals these luck values as it plays out the game. When the game ends, the program subtracts the luck from the actual result, and that gives the variance reduced result for that game.3. Why Variance Reduced Results Are More AccurateThe main problem with rollouts is that there is too much luck. When we play out a game and somebody wins, it doesn't tell us much about the position—maybe the side that won just got lucky. In fact, the side that wins a game always has had the better luck, assuming perfect play and that the position wasn't gin to begin with.By doing many trials we reduce the effect of this luck. Sometimes one side gets lucky, and sometimes the other. Most of the luck will get canceled out if we do many trials because one side's good luck in game 7 is offset by the other side's good luck in game 13.
Variance reduction takes a more aggressive approach. Rather than relying on the luck to even out, variance reduction tries to remove the luck directly. Every turn the procedure estimates how lucky a player was, and this luck is subtracted out at the end. If the luck estimates are any good, they cancel out much of the luck. Let's look at our ten game rollout from before, but now using the correct luck values:
With the correct luck values, variance reduction cancels the luck perfectly, and every game gives a result of Blue winning 37.5%. When luck estimates are perfect, a single game will always give the correct result, regardless of what happened during that game. But in general a program cannot estimate the luck perfectly. Fortunately, it doesn't need to estimate the luck very well for variance reduction to be successful. All that is necessary is to have some idea of which rolls are good and which bad. A program will know that in a race 66 is good and 21 is bad. A program knows that (in the great majority of cases) entering is good and dancing bad; that hitting is good and missing bad; that making points is good and stacking up bad; and so forth. As long as a program is able to meet this crude level of reliability, it will be able to cancel some of the luck in a rollout, and thus make the rollout more accurate. Even if positions come up where a program can't tell a good roll from a bad one, all is not lost. Typically for these positions a program thinks that all rolls are roughly equivalent, and so it won't add a lot of luck one way or the other. But when a real joker comes along, a program is very likely to understand that one side was very lucky, and it will compensate for this. In our example, our original luck values were not very good. But they did recognize that it was good for Blue to roll a double, and bad for Blue when White bears off. That's sufficient. With these luck values, a variance reduced rollout of our example position is typically as accurate as a regular rollout done for eight times as many games. Variance reduction makes rollouts more accurate by subtracting some of the luck that occurs each game. This works as long as a program usually has a rough idea of which rolls are good and which rolls are bad in the positions that arise during the rollout. 4. Why Bad Evaluations Don't Introduce BiasVariance reduction relies on a program's evaluator to estimate the luck of each roll. Many people fear that this makes variance reduction unreliable. But in fact, even if we have very poor evaluations, a variance reduced rollout will on average give us the same answer as a regular rollout. To see why this is so, we need just a little math.Adding and subtracting zeroesIf we add a bunch of zeroes, the total is just zero. 0 + 0 + 0 + ... + 0 = 0.Similarly, if we add a bunch of numbers, where each is zero on average, then on average the final result will be zero. Here's an example of a number that is zero on average. We roll a red die and a green die, and we subtract the red die from the green die. If we roll a green 3 and a red 1, we have 3 − 1 = 2. If we roll a green 1 and a red 5, we have 1 − 5 = −4. On any particular throw the dice might combine for anything from −5 to +5, but on average the result will be zero. If we add the result of many tosses of these dice, the total will be zero on average. Probably we won't get exactly zero, but if we do the experiment many times and take the average, we are very likely to get a result close to zero. If we take a number and subtract zero, we get the original number back. 7 − 0 = 7. Similarly, if we subtract from a number another number that is zero on average, then on average we get back the starting number. If we take our starting number 7 and modify it by subtracting the result of a toss of the red and green dice, we could get a value as low as 2 (7 − (green 6 − red 1)) or as high as 12 (7 − (green 1 − red 6)), but on average we'll just get back 7. Modifying the 7 in this way doesn't change the average result at all; it just introduces some luck into what we get. Because adding many numbers that average zero gives a total that also averages zero, we could roll the red and green dice fifty times, total the result, and subtract that from 7. The result will still be 7 on average. Each toss of the dice will be zero on average; thus the sum of fifty tosses will be zero on average; thus subtracting the total from 7 will leave us with 7, on average. On average, no one gets luckyVariance reduction modifies our rollout result by subtracting an estimate of the luck. The way this estimate is calculated ensures that its average value is zero.Let's look again at a program's luck estimates for our example position:
On the first roll, 1/6 of the time Blue's luck is +62.5%, 5/6 of the time it is −12.5%. The average luck is zero: 1/6 × 62.5 + 5/6 × −12.5 = 0.
On the second roll, 3/4 of the time Blue's luck is −25%, 1/4 of the time it is +75%. The average luck is zero: 3/4 × −25 + 1/4 × 75 = 0. This isn't true just for our example. Because of the way a program estimates luck, the average luck is always zero. The luck associated with a roll is the difference between the evaluation after the roll and the average of this value over all 36 rolls. The difference between any value and its average averages zero. That's the way averages work. The average total luck for a game is also zero. When we add a series of values, where each value is zero on average, the total will also be zero on average. But what if the evaluations are really bad?Bad evaluations don't bias the result. All variance reduction does is subtract the total estimated luck, a value that averages zero. Subtracting zero can't change the expected value of a rollout. No matter how poorly a program understands a position, luck estimates will still average zero, because of the way they are calculated, and so a variance reduced rollout will produce the same result as a regular rollout, on average.Let's work through a variance reduced rollout using the following extremely poor luck estimates:
Here, when Blue rolls doubles, our quantified luck says that his chances have gone down 62.5%. When White bears off on the second turn, we're saying that Blue's chances have gone up 25%. In fact, these adjustments are exactly backwards. We are saying Blue is lucky when he was unlucky, and vice-versa. But note that the luck values still average zero for each roll. On the first roll, 1/6 of the time Blue's luck is −62.5%, 5/6 of the time it is +12.5%, for an average of zero. On the second roll, 3/4 of the time Blue's luck is +25%, 1/4 of the time it is −75%, also averaging zero. Because the average luck is zero, a variance reduced rollout with these values will still give the right answer, despite the horrifically bad evaluations. Let's see how our previous ten game rollout would play out with these luck estimates:
After ten games, the variance "reduced" result is 62.5%, and is less accurate than the actual game result of 50%. Let's continue the rollout for another twenty games:
Both rollouts are moving closer to the true value of 37.5%. Despite using evaluations that are the exact opposite of the correct values, the variance reduced result is not too far off. With enough trials, both procedures will tend to give answers closer and closer to the true value. The skeptical reader is encouraged to extend the rollout for a few hundred games. (They go fast.) With such terrible evaluations variance reduction actually becomes variance increasing. This is the reason the regular rollout is closer to the true value than the variance reduced rollout. Using these luck values it takes about four variance reduced games to equal one game of a regular rollout. Here we would be better off to not use variance reduction—not because we will get a different answer, but because we will get the same answer much more slowly. In practice, evaluations are almost never this bad. Programs have at least a vague idea of which rolls are good and which bad over the course of a game for almost every type of position. And should this problem actually occur it is easily diagnosable. A rollout where variance reduction has done poorly will have a large standard deviation (JellyFish) or confidence interval (Snowie). For any kind of rollout we should pay attention to these numbers. With JellyFish, we can be very confident that if we were to roll the position out forever, we would get a result no more than two standard deviations away from the result we have so far. With Snowie, we can be similarly confident that the result of an infinite rollout would be within the confidence interval. If we need a tighter bound on the equity then we need to do more trials, whether or not the rollout was done with variance reduction. 5. SummaryIf a program's evaluations are good, variance reduction will cancel out a great deal of luck, and the result will be much more accurate than an unadjusted rollout. With perfect evaluations, a single game gives the same result as an infinite rollout. Typically variance reduction makes each game worth about twenty-five regular games.Correctly implemented, variance reduction will never change the
expected value of a rollout, because all variance reduction does is
subtract numbers that average zero.
Variance reduced rollouts give the same answers as regular rollouts.
They just get there faster. Follow-up: David answers questions posed by Jake Jacobs about how and why variance reduction works. See: Questions and Answers about Variance Reduction. |
Return to: |