Hedging Toward Skill

Backgammon Articles

Hedging Toward Skill

by Douglas Zare

This article originally appeared in GammonVillage in 2000.
Thank you to Douglas Zare for his kind permission to reproduce it here.

Fundamental Equation of Luck and Skill

Backgammon is a game of luck and skill. To a casual observer, it may appear to be all luck. As one gains more skill and familiarity with the game, the depth of skill required to play well becomes increasingly clear. The luck is still there, though, much to our joy and frustration.

Sometimes one wants to strip away the luck. Is move A better than move B? Is player X stronger than player Y? A proper respect for the luck in the game is needed. Below, we will consider a method for cancelling most of the luck in backgammon, and will apply it to analyze a game between the computer programs Jellyfish and Snowie, both set on levels much stronger than I am.

I call the following equation the fundamental equation of games of luck and skill:

Final − Initial = Net Luck + Net Skill

Final refers to the final score. This might be +1 or −4 for a money game, or 100% mwc (match winning chances) or 0% mwc for a match.
Initial refers to the starting equity or mwc of the situation considered.
Net Luck as outlined in "A Measure of Luck," has average value 0 on each roll. It also has average value 0 in each game or match.
Net Skill is the difference in the total magnitude of the errors of the players when compared with technically perfect play.

Variance Reduction in Rollouts

Variance reduction for rollouts is described in more detail in David Montgomery's article in the February 2000 issue of Gammonline and in a preprint by Fredrik Dahl, "Variance reduction for Markov processes using state space evaluation for control variates." It is implemented in Jellyfish and Snowie. I summarize it for comparison.

When rolling out a position (comparing play A with play B), we start with a position (or two) whose equity we do not know. We might have Jellyfish play both sides many times, so we hope that the Net Skill is 0, that Jellyfish makes errors of equal magnitude from both sides of a position. This may be unreasonable if one side's choices are easy and the other side's are difficult, and must be reassessed with each rollout. Let us rearrange the fundamental equation under the assumption that Net Skill is 0: Initial = Final − Net Luck.

After rolling a position out, the final result is clear, but what we want is different by the Net Luck. Over a long rollout, the average Net Luck will be close to 0, but we can do better than that. One way is to compute an Estimated Net Luck fairly, and subtract that from the result of the rollout. The estimate Initial ~ Final − Estimated Net Luck will be off by the difference between the actual Net Luck and the Estimated Net Luck, and with an accurate, unbiased estimate of luck, we will get an accurate, unbiased estimate of the equity of the position.

Variance Reduction of Skill

This is suggested in the above preprint by Fredrik Dahl.

Suppose two excellent players play. To find the Net Skill displayed, we could use Snowie to estimate the errors made by each side. Unfortunately, that method is biased. Suppose one of the players is Snowie: Snowie does not play perfectly, but it would rate its own play as perfect, and would reward those whose play resembles Snowie's rather than perfect play. Instead, let us consider the fundamental equation applied to a game. The initial position is even (we don't know who will win the first roll) so the Initial equity is 0. Thus, we can rewrite the equation as Net Skill = Final − Net Luck.

The final result of a rollout is a fair estimate for the initial equity, but often not a good enough estimate. It would completely ignore the effect of luck, and would be incorrect by the average Net Luck. The idea of variance reduction is to compute an Estimated Net Luck fairly, and subtract that from the final result instead. This is off by the average amount of the Net Luck − Estimated Net Luck.

Snowie estimates the luck in a roll by estimating the equity of the best play it sees after rolling, and subtracting the equity of the position before rolling. This is a good estimate, and is equivalent to measuring skill by summing up the errors compared with what Snowie thinks is the best play. I have found this to be tremendously helpful, but because Snowie does not estimate the equity perfectly, this method is biased, and unsuitable for analyzing matches between players close to Snowie's level or in positions where Snowie is less reliable. However, one can fix any estimate of match winning chances to produce an unbiased estimate. An evaluation as good as Snowie's may be corrected to eliminate most of the luck in backgammon without bias.

Instead of comparing the evaluation of the apparently best play after rolling with the estimated equity before rolling, just ask whether the roll was above average or below average. This ensures that the estimated luck will average to 0. Example: Suppose Snowie currently evaluates a position as worth 0.2, but precisely half of the rolls would leave a position Snowie believes is worth 1 and half of the rolls would leave a position Snowie believes is worth −1. Snowie's estimate of the luck if one rolls well is +0.8. Since the average is 0, the unbiased estimate should be that the luck is +1. To be fair, one must evaluate the initial position one ply deeper than the position after rolling.

Jellyfish vs Snowie

The following is a 1-point match between Snowie 3 (3-ply, tiny, 20%) and Jellyfish 3 (Level 7, 1000). This was the first match I tried. Most plays are straightforward, but not all: Jellyfish primes Snowie, obtains a racing lead as Snowie's board crashes, and bears in safely. Afterwards, I performed a variance reduction using Snowie 3 set on 1-ply evaluation.

All of the evaluations are in the unusual units of match equity, which is +1 for a won match and −1 for a lost one. The perspective is always that of Jellyfish, so a bad roll for Snowie shows up as positive luck. Note that the evaluation for a given move does not always agree with the average for the next move, even when these correspond to the same position. Average is the higher ply evaluation.

Jellyfish Snowie
1 5-2: 13/8, 24/22
Average/Evaluation/Luck:
0.000 / 0.012 / +.012 6-3: 24/15
A/E/L: 0.010 / 0.057 / +.047
2 3-3: 13/10*(2), 8/5(2)
A/E/L: 0.054 / 0.284 / +.230 2-3: B/22, 24/22
A/E/L: 0.283 / 0.261 / −.022
3 4-2: 24/20, 22/20
A/E/L: 0.255 / 0.266 / +.011 2-1: 6/3
A/E/L: 0.265 / 0.307 / +.042
4 3-2: 13/10, 13/11
A/E/L: 0.308 / 0.257 / −.051 4-2: 8/4, 6/4
A/E/L: 0.256 / 0.289 / +.033
5 3-2:

Red to play 3-2
R: 137, W: 154

Jellyfish Snowie
5 3-2: 11/8, 6/4
SW Tiny, 20% and SW 1-ply prefer 11/8 10/8, but SW Huge, 100% agrees with this play.
A/E/L: 0.282 / 0.238 / −.044 5-4: 13/8, 13/9
A/E/L: 0.226 / 0.307 / +.081
6 3-1: 10/7, 8/7
A/E/L: 0.307 / 0.261 / &minus.046 6-3: 13/10, 9/3
0.250 / 0.343 / +.093
7 4-1:

Red to play 4-1
R: 128, W: 136

Jellyfish Snowie
7 4-1: 20/15*
Holding the anchor with 8/4 8/7 is preferred by SW Huge, 100% (by 0.1% mwc) and by Tiny, 20%, though 1-ply rates them as equal.
A/E/L: 0.362 / 0.238 / −.124 1-2: B/23, 22/21*
A/E/L: 0.232 / 0.137 / &minus.095
8 4-1: B/20
A/E/L: 0.161 / 0.284 / +.123 5-4: 21/12
A/E/L: 0.277 / 0.386 / +.109
9 1-1: 15/13*, 10/9(2)
A/E/L: 0.391 / 0.510 / +.119 3-2: B/22, 23/21
A/E/L: 0.491 / 0.448 / −.043
10 2-1:

Red to play 2-1
R: 135, W: 147

Jellyfish Snowie
10 2-1: 13/11, 6/5
SW Huge, 100% prefers 6/4* 6/5 by 0.06% mwc, though Tiny, 20% agrees with not hitting. 1-ply rates them as equal.
A/E/L: 0.455 / 0.464 / +.009 6-3: 21/12
A/E/L: 0.458 / 0.420 / −.038
11 4-2:

Red to play 4-2
R: 132, W: 138

Jellyfish Snowie
11 4-2: 11/7, 6/4
All Snowie levels preferred 11/5, Huge, 100% by 0.20% mwc.
A/E/L: 0.414 / 0.374 / −.040 1-4:

White to play 1-4
R: 126, W: 138

Jellyfish Snowie
11 1-4: 22/21*, 8/4
SW Huge, 100% prefers 22/21* 12/8 by 0.17% mwc
A/E/L: 0.351 / 0.181 / &minus.170
12 3-2: B/20
A/E/L: 0.231 / 0.266 / +.035 3-2: 13/10, 12/10
A/E/L: 0.300 / 0.374 / +.074
13 4-2: 7/3*, 5/3
A/E/L: 0.409 / 0.294 / −.115 5-4: B/21, 13/8
A/E/L: 0.271 / 0.047 / −.224
14 5-2: 20/13
A/E/L: 0.077 / 0.117 / +.040 5-5:

White to play 5-5
R: 129, W: 122

Jellyfish Snowie
14 5-5: 8/3(3), 6/1
Snowie flings the dice across the room. This worst possible roll is almost a full point behind 6-6.
A/E/L: 0.112 / 0.303 / +.191
15 3-4: 13/6
A/E/L: 0.314 / 0.316 / +.002 3-1: 4/1, 3/2
A/E/L: 0.310 / 0.219 / −.091
16 5-4: 20/11
A/E/L: 0.227 / 0.181 / −.046 4-1: 6/2, 6/5*
A/E/L: 0.193 / 0.228 / +.035
17 5-5:

Red to play 5-5
R: 118, W: 93

Jellyfish Snowie
17 5-5: B/20*, 11/1, 6/1
All levels of Snowie prefer B/20* 11/6 7/2(2), Huge, 100% by 0.62% mwc.
A/E/L: 0.284 / 0.618 / +.334 2-2: B/21, 10/8(2)
A/E/L: 0.590 / 0.468 / −.122
18 5-3: 20/12
A/E/L: 0.526 / 0.517 / −.009 1-2: 8/5
A/E/L: 0.499 / 0.546 / +.047
19 6-5: 12/1
A/E/L: 0.626 / 0.593 / −.033 3-4: 8/1
A/E/L: 0.607 / 0.654 / +.047
20 5-3: 8/3, 8/5
A/E/L: 0.702 / 0.654 / −.048 3-1: 5/2, 3/2
A/E/L: 0.686 / 0.763 / +.077
21 6-1: 7/6, 7/1
A/E/L: 0.761 / 0.695 / −.066 3-3: 21/9
A/E/L: 0.707 / 0.634 / +.073
22 3-2: 6/1
A/E/L: 0.634 / 0.563 / −.071 5-2: 9/4, 3/1
A/E/L: 0.548 / 0.589 / +.041
23 5-1: 6/5, 6/1
A/E/L: 0.589 / 0.499 / −.090 6-3: 21/12
A/E/L: 0.432 / 0.386 / +.046
24 6-6:

Red to play 6-6
R: 53, W: 63

Jellyfish Snowie
24 6-6: 9/3(2), 5/O(2)
A/E/L: 0.451 / 0.932 / +.481 3-4: 12/5
A/E/L: 0.909 / 0.929 / +.020
25 6-6: 5/O(2), 3/O(2)
A/E/L: 0.910 / 0.991 / +.081 1-2: 21/19, 2/1
A/E/L: 0.994 / 0.995 / +.001
26 4-3: 3/O(2)
A/E/L: 0.998 / 0.997 / −0.001 5-5: 19/4 5/O
A/E/L: 0.998 / 1 / +.002
27 3-3: 3/O 1/O(3)
A/E/L: 1 / 1 / 0 1-2: 3/O
A/E/L: 1 / 1 / 0
28 1-5: 1/O(2)
A/E/L: 1 / 1 / 0 3-3: 4/1(3) 3/O
A/E/L: 1 / 1 / 0
29 4-5: 1/O
A/E/L: 1 / 1 / 0

Evaluations

From the evaluations of Snowie Huge, 100%, it believed that Jellyfish erred by a total of .98% mwc, and that Tiny, 20% erred by 0.17% mwc, so it would rate Snowie's play as stronger by 0.8% mwc, and say that Tiny, 20% should win 50.2% of the time.

The total estimated luck for Jellyfish is +0.865. The outcome was +1, so the estimated net skill was that Jellyfish was stronger by +0.135, or 6.8% mwc.

Is this accurate? No, but it is much closer than saying that Jellyfish will win all the games since it won the first one. We performed variance reduction, not variance elimination. This result is probably further from the correct value than would be obtained by using the biased analysis of Snowie directly. There is still some noise on top of the signal, but it will take a much shorter sequence of matches for this noise to be essentially 0 than if one were not to try to cancel out the luck. My guess is that there is a factor of 1/5 to 1/10 as much noise as before, and that this would be reduced much more by using 2-ply or 3-ply evaluations.

One should keep in mind that the level of skill is also subject to change, since the player might be better or worse at handling blitzes, backgames, or holding games. This may be viewed as another source of noise that is not affected by the variance reduction. Even within a type of game, a player might make more or fewer errors at random. So even with a perfect estimate of the Net Luck, one would need a few games of each type to estimate the average skill displayed.

Hedged Backgammon

One can use the idea of variance reduction to remove much of the luck of backgammon as one plays, although it is awkward at the moment to do this for two human players in real time. Play a normal match or money game, but at the end, pay the unbiased estimate of your luck (or receive it if you were unlucky). This is equivalent to making a series of fair side-bets suggested by a bot's evaluation: Each player bets that he or she will appear unlucky to the bot. By betting against their own luck, both players will experience smaller swings, and since the bets are fair they will still have the same incentive to play well. The payment is the estimated Net Skill.

If the example match above were hedged, then Jellyfish would collect 0.135 times the stakes from Snowie: 1 for winning − 0.865 in side bets.

An odd effect of hedging would be that one would probably owe something after winning a short blitz—it is hard to blitz correctly, and easy to be blitzed correctly, so one's luck while blitzing usually exceeds the point won. Would it be worthwhile to avoid positions like this which are hard to play? As with regular backgammon, yes, but only if avoiding the hard position is less of an error than one expects to make while playing it.

What difference does it make if the bot whose evaluations are used is stronger? The quality of the estimate of Net Skill is beneficial but not very important as long as it is unbiased. Bots which play better will tend to have better estimates, which means that there will be less noise added to the payoffs. If one plays through positions that the evaluator does not understand well, there may be larger variations. These sources will be added to the natural variations in the actual displayed skill. Reducing the noise from the estimate a few percent by using rollouts would be like adding more insulation to the walls without closing the window. For intermediate players and above, though, the error in the estimate of Net Skill is nowhere near the oscillations from the unhedged luck of the dice.

Suppose I play 25 1-point matches with a bot stronger than I am by 10% mwc, i.e., it will win 3 out of 5 matches. It should win 15 of the 25 matches, but due to variations in luck, 15% of the time it will win at most 12, and 15% of the time it will win 18 or more matches. There is almost a 1/3 chance that the estimated skill difference would be off by more than 12%. By the FIBS ratings formula, rather than being ahead 350 rating points the bot might appear to be 825 points ahead or 65 points behind.

What happens if we hedge the matches? That depends on a few things. I believe the following are plausible assumptions:

Suppose our Net Skill varies by +−10%, that is, half the time we play equally well, and half of the time I throw away 20% more equity than the bot. Suppose the bot's estimate is off by +−10% each time. Then 88% of the time the adjusted score would be as though the bot won between 14 and 16 matches, between 215 rating points and 505 point ahead, and over 99% of the time the hedged score would be as though the bot won between 13 and 17 matches. To achieve the same level of accuracy without hedging, we would need to play more than 10 times as many matches.

Much as I have enjoyed playing hundreds of matches and money games against Jellyfish and Snowie to see if I learned anything from reading a backgammon book, variance reduction of skill is the feature I would most like to see in the next editions of backgammon programs.

Douglas Zare is a mathematician and backgammon theorist. He writes a monthly column at GammonVillage on the theoretical aspects of backgammon. His web site is douglaszare.com.

Return to : Articles by Douglas Zare : Backgammon Galore