Backgammon Programming

Programming

Variance reduction in races

From:   David Montgomery
Address:   monty@cs.umd.edu
Date:   1 December 1998
Subject:   Variance Reduction in Races
Forum:   rec.games.backgammon
Google:   741fjo$42d@krackle.cs.umd.edu

Assume we have a database of single-sided evaluations of non-contact positions. This database is built up assuming that each move is made to minimize the expected number of rolls left. Although the database is highly accurate for predicting cubeless win probability, we still may want to roll out a position from the database, either to get a more accurate evaluation, or because we are doing a cubeful rollout. We can use the database to get tremendous variance reduction in our rollout. As with lookahead variance reduction, the idea is to accumulate a "luck adjustment" over the course of a game. For each turn the luck adjustment should have a mean of zero and correlate well with how lucky the roll was. We can easily estimate how lucky a roll was by simply using the database to compare the cpw of the position before the roll with the cpw after the roll. However, these luck ratings might not have mean zero if sometimes we make moves that do not minimize the expected rolls to bear off. Because of the way a single-sided database is built up, we can ensure a mean luck adjustment of zero by calculating the post-roll cpw assuming that we had made the mean roll minimizing play. Below is pseudo-code. turn = 0 old_cpw = cpw( position we're rolling out ) cumulativeLuck = 0.0 while ( game not over ) { get the dice roll get the best move new_cpw = cpw of the *mean roll minimizing move* luck = new_cpw - old_cpw; if ( turn == 1 ) cumulativeLuck = cumulativeLuck - luck else cumulativeLuck = cumulativeLuck + luck switch sides old_cpw = 1.0 - cpw( move that was made ) if ( turn == 1 ) turn = 0 else turn = 1 } if ( turn == 1 ) final_cpw = 1.0 else final_cpw = 0.0 final_cpw = final_cpw - cumulativeLuck This variance reduction method is extremely powerful. Because the database evaluations are already so accurate, the luck adjustments are very good -- much better than the lookahead luck adjustments. In the handful of positions I have checked, a few hundred games generally gives the exact cpw to four or five decimal places (based on Sconyer's CD). Furthermore, this method is very fast. The cpw of the mean roll minimizing move can be identified while determining the best move with virtually no extra work. Unlike in lookahead reduction, we only evaluate continuations based on one roll of the dice. David Montgomery monty@cs.umd.edu monty on FIBS

Brian Sheppard writes:

Instead of rolling out a race, just do a two-ply lookahead. That is much faster than a rollout, and the answer is just as good. The variance-reduction method you give is wrong. See later comments. The best rollout technique is to statically evaluate racing positions as soon as contact is broken. This way there is no statistical error from rolling out the rest of the game, and you can get an evaluation long before the game reaches a database position. Race evaluation functions are exceptionally accurate. For example, my race- estimation function has an average error that corresponds to misestimating a race by 1/10 of a pip. For a moderate-length race where each side has a 50% chance of winning there may be an error of 0.004; certainly this is good enough for any practical purpose. JF is not quite as good as this, but it isn't necessary to be that good; I have gone overboard on this particular feature. (BTW: One neat feature of evaluating immediately upon breaking contact is that the race at that point must be either long or lopsided. Racing evaluators are highly accurate in such cases. The average error from evaluating such cases may be under 0.001.) And it is much faster to statically evaluate. This is a big advantage because you can complete more iterations that way. My racing evaluator is cubeless, but it could be extended to cube-using with equally good accuracy. It can even be extended to match-play with a little more effort, but still with equally good accuracy. > turn = 0 > old_cpw = cpw( position we're rolling out ) > cumulativeLuck = 0.0 > while ( game not over ) { > get the dice roll > get the best move > new_cpw = cpw of the *mean roll minimizing move* > luck = new_cpw - old_cpw; > if ( turn == 1 ) cumulativeLuck = cumulativeLuck - luck > else cumulativeLuck = cumulativeLuck + luck > switch sides > old_cpw = 1.0 - cpw( move that was made ) > if ( turn == 1 ) turn = 0 > else turn = 1 > } > if ( turn == 1 ) final_cpw = 1.0 > else final_cpw = 0.0 > final_cpw = final_cpw - cumulativeLuck The "cumulativeLuck" variable is a "telescoping series" that collapses. Isn't this just an expensive way to compute the initial value of the series, which is the static evaluation at the start of the run? On each turn you compute the change in evaluation, which you sum in the CumulativeLuck variable. At each stage, therefore, new_cpw = original_cpw + CumulativeLuck. At the end of the game, final_cpw is set to be the winner, but that also happens to be the last value of new_cpw as well. So when you subtract CumulativeLuck you are left with original_cpw. Variance reduction has been incorrectly described previously in this newsgroup. I described a correct variance reduction algorithm for this newsgroup. To find that article, go to dejanews.com, and search rec.games.backgammon for articles containing the words "variance reduction" written by bsheppard@hasbro.com. Brian

David Montgomery writes:

Brian Sheppard writes: > Instead of rolling out a race, just do a two-ply lookahead. > That is much faster than a rollout, and the answer is just as good. It is faster than a rollout, but it may or may not be just as good. It depends on the quality of the basic evaluator that you use in your lookahead evaluation. And it depends on what you consider "just as good." I can't comment on the quality of your race evaluator. I'm sure its very good. However, my article was about evaluating races with single-sided databases. This approach is in wide use -- I can name a dozen people who have created single-sided databases, including the creators of Jellyfish and Snowie. It was because of the widespread use of this technique that I posted my idea. Position 1. +24-23-22-21-20-19-+---+18-17-16-15-14-13-+ | O O O O O | | | | O O | | | | O | | | X on roll, cpw? | | | | | X X X X | | | | X X X X X | | | +-1--2--3--4--5--6-+---+-7--8--9-10-11-12-+ If you use a single-sided database to evaluate Position 1 you get 80.6%. If you look ahead two ply (one move for each side), you get 80.1%. The true percentage is 79.6%, and you get this figure if you do a short rollout using the method I presented. I don't consider a half percent error to be as good as an error of less than .01%, but you may. Position 2. +24-23-22-21-20-19-+---+18-17-16-15-14-13-+ | O O | | X X | | O O | | | | O O | | | X on roll, cpw? | O O | | | | | | | | X X | | | +-1--2--3--4--5--6-+---+-7--8--9-10-11-12-+ In Position 2, if you use a single-sided database you get 14.1%. The same if you look ahead 2 ply. The real cpw is 14.7%. I don't consider a .6% error as good as no error, but you may. Brian Sheppard writes: > The best rollout technique is to statically evaluate racing > positions as soon as contact is broken. This way there is no > statistical error from rolling out the rest of the game, and > you can get an evaluation long before the game reaches a database > position. Assuming you are rolling out a position with substantial contact and you have a good non-contact evaluator, I agree with you. However, there are other reasons for evaluating and rolling out racing positions. - You may have an interest in a non-contact position by itself. Perhaps you want to check whether your take of that 16 cube last night was correct. Although your non-contact evaluations may be accurate, the evaluations of Jellyfish and Snowie are often off by several percent, so I would want to check them with a rollout. - You might want to verify that your non-contact evaluator is as good as you think it is. Rolling positions out is a great way to do this. - You might actually be creating a new non-contact evaluator that isn't based on database lookup, and need training data. Again, rolling out positions is a good approach. - If you use the cubeful rollout method of Jellyfish and Snowie, you have to do rollouts rather than truncate upon reaching non-contact. There are certainly other algorithms for live cube rollouts, and I am not endorsing theirs, but if you want to use that algorithm, you have to keep rolling. Brian Sheppard writes: > Race evaluation functions are exceptionally accurate. [...] > My static analyzer is accurate to two or three decimal places, > and the extra accuracy [of a rollout] does not matter compared to > the expense of rolling out a few hundred games. I'm happy for you that your evaluator is so good. Jellyfish and Snowie can be off in the second digit. I've seen evaluations that are off by 7%, and rollouts that are off by 3.5%. It depends on why we are evaluating a position. If the only reason we are looking at a position is because it came up in a rollout, then it is probably better to just evaluate it and get on to the next game. But if we care about the position itself, the expense of rolling out a few hundred games (which takes about a second on my slow Pentium 100) is well worth it, especially if by doing so we get an answer that is essentially exact. There is another issue apart from accuracy per se. Bias. Single-sided database evaluators are pretty darn accurate, but they are biased. They systematically underrate certain kinds of positions. One reason I am rolling out positions for which I have database evaluations is so I can eliminate this bias. Brian Sheppard writes: > My racing evaluator is cubeless, but it could be extended to cube-using > with equally good accuracy. It can even be extended to match-play with > a little more effort, but still with equally good accuracy. I don't know of any way that single-sided non-contact database evaluators can be extended to cube-using or match-play situations. You can build a two-sided database that is exact for the cube, but these are huge. Hugh Sconyers sells these databases, and on three CDs you still have only about a third of the positions where both sides are all home. You can write cube-handling and match-play code on top of a single-sided evaluator, just as JF and Snowie add their cube handling on top of cubeless evaluations, but this is a completely distinct system that has nothing to do with the single-sided database. Brian Sheppard writes: > The variance-reduction method you give is wrong. Uh, no, you're wrong. The method is right. It's implemented and heavily tested. It works great, making each game worth about 2000. > The "cumulativeLuck" variable is a "telescoping series" that collapses. No, it's not a telescoping series. If you look carefully, you will see that I put asterisks in the important line > > new_cpw = cpw of the *mean roll minimizing move* This may or may not equal cpw( move that was made ), depending on whether the move that was made was the mean roll minimizing move. Often the best move is not the mean roll minimizing move. The most well-known example is catering to doubles when you have only one possible shake left, but there are many others. As you wrote, if it were a telescoping series, I would get the answer I started with. But I don't get this. Instead I get the same cpws as in Sconyers' CD, which are often half a percent or more different from the single-sided database cpws. Brian Sheppard writes: > Variance reduction has been incorrectly described previously in this > newsgroup. I described a correct variance reduction algorithm for this > newsgroup. To find that article, go to dejanews.com, and search > rec.games.backgammon for articles containing the words "variance > reduction" written by bsheppard@hasbro.com. Thanks for the tip, Brian, but I figured out lookahead variance reduction about fifteen minutes after I read about it in the Jellyfish 2 manual. But just for the heck of it, let's take a look at what you wrote on variance reduction. The quoted text below is taken from Tom Keith's newsgroup archive, Programming section. You were responding to Michael J. Zehr, who wrote: > The basic idea of any variance reduction method is to estimate the > amount of luck each side has and factor that out. For example, > if one side has two on the bar against a 5-point board and rolls the > necessary doubles to come in that's a very lucky roll. A variance > reduction algorithm would measure that luck and change the results of > the game in some way to take account of it. > > In the example above, if the neural net's evaluation of rolling the > double to enter is +.243 equity, and the evaluation of the position > before rolling as -.324, that roll was worth +.567. If the player who > rolled the lucky double goes on to win then instead of getting 1 point > for the win they might get 1 point less some fraction of the equity > from their lucky rolls. This is correct as long as the -.324 was the lookahead evaluation. Michael here wasn't spelling out the algorithm -- he was giving "[t]he basic idea." Michael may have missed this point, but it's more likely in my opinion that he assumed that readers sophisticated enough to understand what he was talking about would be sophisticated enough to supply this detail. After all, the variance reduction algorithm had already been described in the newsgroup in more detail by Jim Williams, over a year earlier. (This article is also in the Backgammon Galore archive.) Brian Sheppard wrote: > Whoa! This is not "variance reduction." This is "bias introduction." No, it's not. It's the same algorithm as you presented, as I will show below. Brian Sheppard wrote: > Let E0 be the evaluation of the root position of the rollout. After 1 > turn the evaluation changes to E1, then to E2, and so on. In the final > position the outcome is V. What is the sum of the changes from turn to > turn? > > In the first turn the change is E0 - E1. In the second turn the change > is E1 - E2. In the third turn the change is E2 - E3, and so on. The > sum is > > (E0 - E1) + (E1 - E2) + (E2 - E3) + (E3 - E4) + ... + (En - V) > > This type of sum is called a "telescoping series," because it > resembles a collapsible telescope. Let's call the lookahead estimate of a position N laEN. Then, if you correctly use lookahead, the sum is (laE0 - E1) + (laE1 - E2) + (laE2 - E3) + (laE3 - E4) + ... + (laEn - V) which is not a telescoping series. Brian Sheppard wrote: > Let E be the true cubeless equity function, and e be our static > evaluation function (e.g. a neural net). > > Instead of regarding the rollout as establishing a value for E, let's > think of it as establishing a value for E - e. > > Let's begin by doing a 1-ply search, then estimate E as the average > of the 1-ply values of e. Okay. We want delta = E0 - e0. To start you estimate delta as lae0 - e0. > This would be completely correct if e were perfect, but since e isn't > perfect, we need to estimate the error involved. [...] Let's just pick > a 1-ply position at random, and estimate the error involved in its > evaluation. [...] I can continue this process until I reach a > terminal node. Okay. We started with delta ~= lae0 - e0. To correct for the error in lae0, we first select a next move. The evaluation of this move was part of lae0. We estimate the error in lae0 using an estimate of the error in the evaluation of the continuation: lae1 - e1. This correction is added to our estimate of delta. We continue with these corrections during the rest of the rollout, ending up with delta = (lae0 - e0) + (lae1 - e1) + ... + (laen-1 - en-1) + (V - en) When we are done, since delta is an estimate of E0-e0, we estimate E0 as e0 + delta. So we have E0 ~= e0 + (lae0 - e0) + (lae1 - e1) + ... + (laen-1 - en-1) + (V - en) Canceling the e0 and rearranging terms we have E0 ~= (lae0 - e1) + (lae1 - e2) + ... + (laen-1 - en) + V This is exactly the same thing as the algorithm described by Jim Williams and alluded to by Michael Zehr. It's the same thing as my race algorithm except now we don't have to do lookahead, and each term in the summation is much more accurate because the database evaluations are so good. David Montgomery monty@cs.umd.edu monty on FIBS

David Montgomery writes:

A couple people have told me that it isn't immediately clear that the expected luck adjustment at each ply is zero. To understand this you have to understand how single-sided racing databases work. Here is a detailed explanation. For each position in the database you will calculate the probability that the position will be borne off in 0, 1, 2, etc. rolls. The database is built up by beginning with the simplest position, which is everybody borne off. Obviously, for everybody borne off you are completely borne off 100% of the time in 0 rolls. Then you move to the next most complex position -- a single checker on the one point. You look ahead and get the best move for each roll, and determine that after all of the 36 rolls, in your next position, you will be off 100% of the time in 0 rolls. Those positions are one roll later, so here you are off 100% of the time in 1 roll. A little later you get to the position with a single checker on the seven point, let's say. Again, for each roll you find the best move and get the probabilities of being off in each number of rolls. Here you find Number of Off Off Off Successors in 0 in 1 in 2 Rolls 2 0.000 0.944 0.056 21 11 0.000 1.000 0.000 11 31 41 51 32 42 23 1.000 0.000 0.000 everything else Combining these and adding in the roll for looking ahead you can determine the probability distribution for one checker on the 7 point: Off Off Off in 1 in 2 in 3 0.639 0.358 0.003 Since the database in single-sided, you are never concerned with the opponent's checkers. Instead, the best move is just considered to be the one that will get off in the fewest rolls, on average. You can easily calculate the average number of rolls that it will take to bear off using the probability distribution. For example, with a single checker on the seven point, on average you bear off in .639x1 + .358x2 + .003x3 = 1.364 rolls. By proceeding through the positions in an orderly fashion (there is more than one way to do it), you ensure that whenever you want to calculate the database entry for a position P, you have already calculated the entries for all the successors to P. Thus you never have to look ahead more than 1 ply. Once you have constructed a database like this, you can use it to get (usually) very accurate estimates of the cubeless probability of winning (cpw) in positions with checkers for both sides. Here is an example. X has a single checker on the 7 point. His distribution, as calculated above, is: Off Off Off in 1 in 2 in 3 0.639 0.358 0.003 His opponent O has a single checker on the 4 point. Her distribution is: Off Off in 1 in 2 0.944 0.056 If X is on roll he will win whenever he is off in 1 roll (.639), plus whenever he is off in 2 rolls and O is off in 2 or more (.358x.056) for a total of .659 (65.9%). Although these probabilities are usually very accurate, they are not necessarily exact. This is because the probability distributions assume that a player is always making the move that will bear off the fastest on average. Sometimes a different move is best when the opposing position is considered. Now let's look at the luck adjustments. First, the equation-free presentation. The cpws that you get using the database are all calculated assuming that both sides play the rest of the game moving to minimize the rolls left to bear off. Let's call these database estimates of cpws d-cpws. If both sides move this way, every d-cpw is exact because the probability distributions are exact. Let's assume that both sides play this way. Now note that every d-cpw must necessarily be the expectaton of the d-cpws for successor positions. Since the d-cpws are exact, it can't be any other way. If this isn't clear to you, you may want to look at the equations further on. Since the d-cpw for a position is the expectation of the d-cpws for successor positions, the expected difference between the d-cpw and the d-cpws for successors must be zero. This is exactly the property that the variance reduction algorithm uses. The expected difference between the d-cpw for the current position and the d-cpw for the position that would follow it if you move to minimize mean bearoff rolls must be zero. Thus the expected luck adjustment at each ply is zero. Thus the expected overall luck adjustment for each game is zero. Thus the method introduces no bias. For those not satisfied with that, here are some equations. Let n[j] be the probability that the non-roller will be off in j rolls. Let r[i] be the probability that the roller will be off in i rolls. Let m[d][i] be the probability that the roller will be off in i more rolls after playing roll d in the mean roll minimizing manner. That is, 36 --- r[i+1] = (1/36) \ m[d][i] (1) / --- d=1 Let M be the maximum possible rolls that can be taken by either side. There are many ways of expressing the roller's chances. Let's start with this one: M j --- --- roller's d-cpw = \ { n[j] * \ r[i] } (2) / / --- --- j=1 i=1 This says that the roller will win whenever the non-roller takes j rolls and the roller takes anywhere from 1 to j rolls. Okay, now let's look ahead, after the roller plays roll d, and express the non-roller's chances following the form of equation (2): M i --- --- non-roller's d-cpw = \ { m[d][i] * \ n[j] } (3) after roll d played / / --- --- i=1 j=1 This says that whenever the roller still has i rolls left after playing roll d (m[d][i]), the non-roller will win if she can take i or fewer rolls. The roller's chances in the original position are just 1 minus the average of (3) taken over all the dice rolls: rollers d-cpw = 36 M i --- --- --- 1 - (1/36) \ [ \ { m[d][i] * \ n[j] } ] / / / --- --- --- d=1 i=1 j=1 We can change the order of summation: M 36 i --- --- --- 1 - \ [ { (1/36) \ m[d][i] } * \ n[j] ] / / / --- --- --- i=1 d=1 j=1 This can be simplified using equation (1) to get: M i --- --- 1 - \ { r[i+1] * \ n[j] } / / --- --- i=1 j=1 This says: if the roller takes i+1 rolls, the non-roller will win by taking i rolls or fewer -- the roller will therefore win 1 minus that. Clearly this is the same quantity as equation (2), showing that the expected d-cpw you get looking ahead is the same as the d-cpw you calculate directly. David Montgomery monty on FIBS monty@cs.umd.edu

Did you find the information in this article useful?

Do you have any comments you'd like to add?