Analyzing Performance

Normalizing Errors

Douglas Zare, 2006

GammonVillage Magazine, July 2006

Thank you to Douglas Zare and GammonVillage for their kind permission to republish this article here

Contents

• mwc
• EMG
• Match Lengths
• A New Normalization
• Unforced Moves
• Summary

How do we measure the size of an error? How do we determine which errors are the most serious, and most worthy of attention? Two possibilities are to use match winning chances mwc and the money game equivalent (EMG). We’ll see that neither is ideal, and that other normalizations are possible.

How should we compute the error rate? Should we include the total number of moves or the number of unforced moves? Although it is unpopular, I’ll argue that including all moves is generally better than just the unforced moves. Again, we’ll see that there are other possibilities.

mwc

By using bots, we can quantify the value of a position, and therefore the size of an error. One way to describe the value of an error is to say how much less an ideal player would win the match after making the incorrect play instead of the correct play. Match winning chances are often expressed as a percentage.

There are advantages to using mwc. If you consistently give up less mwc than your opponent, you expect to win more than 50% of the time, and the difference indicates your advantage. However, it is hard to compare errors. Subtle mistakes at double match point (DMP) cost the same as colossal blunders such as overlooking a shot or failing to double a volatile borderline pass at the start of a match. It is not easy to say how much equity a novice versus an expert should expect to give up in a match since the potential for making large errors in mwc is much greater if the match reaches DMP; it’s not good to say you played like an expert primarily because you avoided DMP.

EMG

EMG is a common method for normalizing equities and errors, both in match play and in money play. Winning a single game on the current cube level is worth +1.000. Losing a single game on the current cube level is worth −1.000. Other equities are interpolated linearly, and are often expressed as decimals with 3 digits after the decimal point.

EMG usually is better than mwc at describing the size of a conceptual error that led to a misplay, and it becomes reasonable to judge backgammon performance by an error rate expressed in EMG/move. However, there are some drawbacks. Using EMG gives up the property that making smaller errors than your opponent means you are outplaying your opponent. If your opponent incorrectly takes a 1.100 pass at 3-away 3-away, costing 0.100, about 1% mwc, this is cancelled out if you make a 0.040 EMG error on a 2-cube, which also costs 1% mwc. In addition, EMG distorts the sizes of some errors that should be conceptually similar.

Position 1.
Black to play 6-2 at 3-away 3-away.

Instead of 6/off, 4/2, which wins 44.0%, suppose you play 6/off, 5/3, which wins only 39.4%. After either play, you know you will be redoubled for the match, and you will have an easy take. Leaving 5 and 2 is better by about 4.6% mwc, and about .184 EMG with your opponent holding a 2-cube.

Suppose you already hold the cube on 4. 6/off, 5/3 is still wrong by the same amount of mwc, but now it only costs 0.092 EMG. The difference is that the range [-1.000,+1.000] EMG now represents 100% mwc instead of the 50% mwc between leading Crawford 3-away and trailing Crawford 3-away. One way to look at this example is that the size of an error gets magnified if you are likely to take a double soon (or you will double your opponent in). In this case, the magnification is a factor of 2, but at other scores the magnification can be slightly larger or smaller. If you delay a double at 2-away 2-away, this will magnify the sizes of your checker play errors in EMG by a factor of 3 relative to doubling immediately.

The match score affects the sizes of errors in other ways, too. For example, suppose you err in this situation:

Position 2.
Black to play 4-2 in the Crawford game.

If you play 11/5 instead of 11/7, 5/3, you miss on one number, 4-4, when you get a chance to roll. At Crawford 3-away, getting gammoned costs you about 0.060 EMG, so this mistake costs you under 0.002 EMG. If you made the same mistake at Crawford 2-away, getting gammoned costs 2.000 EMG, over 30 times as much. Now playing 11/5 costs 0.049 EMG, even though it may be considered the same conceptual error since your only goal was to save the gammon.

Match Lengths

Snowie calls an error rate under 4.4 millipoints EMG per move “world-class.” However, it is not uniformly difficult to achieve a low error rate, since the problems with the EMG normalization are severe at some match scores, and some match lengths tend to include games at “easy” scores.

At DMP, it is particularly easy to achieve a low error rate. Many players who normally have an “advanced” error rate can average a “world-class” error rate at DMP. There are several reasons for this.

There is no magnification from the doubling cube.
Plays that win more do not also win more gammons and lose fewer gammons.
The lack of double-outs means there are more simple racing decisions.
You don’t make cube errors at DMP.

There are some factors that tend to increase the complexity of playing at DMP, e.g., that we are used to gammons mattering, so plays that might be right for money may be wrong at DMP, and the use of the cube increases the fraction of the moves at other scores played during the opening. However, in practice, the simplifying factors dominate. Both experts and novices tend to have much lower error rates at DMP than at other scores.

As I mentioned last month, I’ve decided to warm up my backgammon play again. I analyzed 227 recent matches of mine, plus a couple of money sessions, mainly played in the past 6 weeks on FIBS. Here are my error rates by match lengths in that sample, according to evaluations only:

	error rate mppm EMG	# of matches
1 point	1.71	58
3 point	3.63	51
5 point	3.69	89
7+ point	3.43	27
money	3.74	45 games

My average error rate in the 1-point matches was 1.71, just under half of my error rate at any other match length. Is this just me? No. My human opponents had an average error rate of 6.29 in 1 point matches, and 11.17 in longer matches. (I seem to recall the play being stronger on FIBS in the past. Two players rated about 1800 and 1850 redoubled me leading 2-away 3-away.) My opponents in 1-point matches had a lower average rating than my opponents in longer matches, so I believe it is typical to have about half of your normal error rate at DMP.

How can this be useful? There are several adjustments I make. I usually consider any opening or middle game checker play mistake larger than 0.080 EMG to be a blunder, but at DMP, I lower that threshold to 0.050 EMG. I don’t consider differences smaller than 0.020 EMG to be errors, but at DMP, I lower that to 0.010 EMG.

In general, I’m not satisfied with my play if my error rate is 2.0 at DMP, though I would be happy to be able to average that at most scores. This affects my evaluation of my play at match lengths that may pass through DMP or DMP-like scores. For example, should an error rate of 3.0 in a 3 point match be considered a good performance for me? That depends on which match scores were hit within the match. If the scores visited were 0-0, 2-0, and 2-2, then two out of 3 of the scores were DMP and Crawford 3-away, with is very similar. 3-away 3-away is only slightly more complicated than money play, so I would not be satisfied with an error rate of 3.0. If the scores visited were 0-0, 1-0, and 1-2, then all 3 scores can be tough, and an error rate of 3.0 is better than my average.

Many people have suggested that it is easy to have a low error rate in a 3-point match. I have found that 3-point matches with low error rates are common, but so are 3-point matches with high error rates. It is not easy to have a low error rate in 3-point matches across many matches.

A New Normalization

I propose a new normalization of errors for the purpose of computing more meaningful error rates in match play: Adjust the errors by the ratio of the size of the error of misplaying an opening 3-1 8/4 for money play and and at the match score. 8/4 wins about 10% less, wins about 4% fewer gammons and loses about 4% more gammons. In money play, it costs about .390 EMG. At DMP, it costs about .195 EMG. That suggests multiplying all errors at DMP by .390/.195 = 2. This would bring my error rate at DMP close to my error rate for money play. It may also be useful to adjust move filters by this factor, as bots now consider too many moves at DMP.

This new normalization does not solve the problems illustrated in the two positions above. To handle those issues, one method is to consider the amount of cubeless money equity given up by the move. This ignores the future cube decisions and gammon price. Unfortunately, cubeless money equity also sometimes recommends the wrong plays, but in the above positions it may be a more accurate way to measure the size of the conceptual errors.

The exact values used are not important, but here is a table of suggested correction factors to EMG within the 3-point match:

1-cube

	opponent
	1-away	2-away	3-away
1-away	2.0	1.4	2.0
2-away	1.4	0.7	0.8
3-away	2.0	0.8	0.9

2-cube

	opponent
	2-away	3-away
2-away	2.0	1.4
3-away	1.7	1.3

Using these values would still result in many matches with low error rates and many matches with high error rates. It might raise my error rate for 3-point matches to a value significantly higher than my error rate in money games. The point is to concentrate on the times I made plays indicating large conceptual errors. This normalization would make it easier to find the matches with blunders at DMP while not worrying as much about the matches with the same unnormalized error rate with minor errors at 2-away 3-away, or a delayed double at 2-away 2-away.

Unforced Moves

To compute the error rate, Snowie divides the total errors in EMG by the total number of moves (for both sides). Gnu Backgammon divides by the number of your unforced moves (or “close” cube decisions). If you dance or roll 6-6 in the bearoff, Snowie gives you credit for playing perfectly. Gnu does not. Many people have argued that gnu’s method is a clear improvement. I don’t agree.

The number of moves you make is very close to the number of moves your opponent makes, but the number of unforced moves may be quite different. In fact, your playing style can affect the ratio of unforced moves you make to the number of unforced moves your opponent makes. If you blitz your opponent too often at DMP and often achieve strong 5 point boards and closeouts, then you will have many more unforced moves than your opponent will. It is possible to give up more equity overall, while gnu reports that you are giving up less equity per unforced move. It is very hard for this to happen with Snowie’s measurement.

Suppose you get to a classical high anchor holding game. Perhaps it is a small error to double now, since you are not quite far enough ahead in the race. On the other hand, if you don’t double now, you will be faced with a series of increasingly complicated cube decisions, e.g., you might lose ground in the race, but your opponent might have to leave the anchor with one checker, or might have to kill some checkers. Will that be strong enough to double? To minimize your total error, and your Snowie-method error rate, it may be right to double now, making a small investment of equity to simplify your future decisions. However, this will greatly decrease the number of unforced cube decisions, so it might increase your gnu-method cube error rate. Trying to minimize your Snowie-method error rate will lead you to minimize your total errors. Trying to minimize your gnu-method error rate may not.

Another problem with gnu’s method is that your error rate can be ridiculously high if you encounter few unforced decisions in a match. This happens frequently with the cube decisions, which makes gnu’s cube error rate unreliable. A high error rate often indicates that someone had few decisions rather than that the player lost a lot of equity. In fact, a low cube error rate often means the player had no real decisions, but gnu thought the player had many unforced cube decisions.

If forced moves were to happen at random, and were not affected by previous plays, then considering only unforced moves would be a clear improvement. However, upon closer inspection, the distinction we would like to draw is not between forced and unforced moves. It is between trivial moves and nontrivial moves. You can’t make an error while dancing, but you also have little real opportunity to make an error when you are moving your checkers around the board after getting closed out, when you have negligible chances to win or to get gammoned. It is also not a real opportunity to err if you roll an opening 3-1.

There need not be a dichotomy. Trivial moves and nontrivial moves may be part of a continuum. I am working on a measure of the complexity of a decision that would be able to detect not just how many real decisions you faced, but also how hard those were. My hope is to be able to compare how well someone played with “par” for the decisions faced.

Summary

Measuring errors by mwc makes it hard to compare errors. EMG is better for comparing errors.

The same conceptual error may result in a larger EMG error if a double is imminent than if the cube is already turned. The size of an error expressed in EMG may depend on the gammon price. To measure the size of a conceptual error in these situations, it is useful to consider the amount of cubeless money equity given up.

Error rates tend to be about half of normal at DMP, since the cube is out of play and gammons do not count. I propose a new normalization for errors which fixes the size of a standard error, playing an opening 3-1 8/4.

Gnu’s method of dividing by the number of unforced moves instead of total moves has some undesirable consequences.