This article originally appeared in the September 1999 issue of GammOnLine.|
Thank you to Kit Woolsey for his kind permission to reproduce it here.
"Who's the best backgammon player in the world? How does my
game compare with the best players, other players on my internet
server, players in my local club?" These are questions backgammon
players often ask. Answering these inquiries for backgammon is even more
difficult than for other intellectual games like chess and contract
bridge. The reason is not surprising, since the answer is the same
for many backgammon questions: "the dice!" Keeping this in mind,
there are several techniques to rank players. This article is an
attempt to summarize those methods.
Before I begin, let me emphasize some oft-overlooked points. Any measurement (for example, temperature of the outside air, distance between two cities, survey of voters for a presidential election) has associated with it some hidden factors: assumptions and mathematical uncertainties. If the assumptions on which a measurement is made are invalid, so is the result. Also, measurments have associated statistical uncertainties (if there are random processes which affect the result, and there almost always are) and systematic uncertainties (for example, if a thermometer's calibration is off and it always reads 2 degrees high). Whenever you read about a measurement result and conclusion, you should ask "what were the assumptions, and what are the uncertainties?" A careful researcher will have included these for you. Unfortunately, all too often these details are lost along the way. I will try and list (and where possible, quantify) assumptions and uncertainties throughout this article.
A clarification of jargon is in order: ranking is a sequential ordering which indicates relative strength, but does not necessarily quantify the results. It says the 35th ranked player has not performed as well as the 34 ahead but better than all those of lower ranking. It does not necessarily indicate how much better or worse the player is. A ratings system actually indicates how much better player A (with rating a) is than player B (with lower rating b).
I. Kaufman Ratings SystemThis method is a modification of the successful ratings system used in chess. To my knowledge, the backgammon rating system in common use on today's online servers is a modification by Larry Kaufman of the chess ratings system. I refer the interested reader to an article by Kaufman in Inside Backgammon, vol. 1, #5 (Sept-Oct. 1991) and to a web page by Kevin Bastian (http://www.bkgm.com/articles/McCool/ratings.html) for more detail. I now list advantages and disadvantages of this system. I also assign a letter grade (A thru C from best to worst) to the qualities of the various systems.
The four biggest Kaufman Ratings databases are the three above-mentioned internet servers and Kent Goulding's International Backgammon Rating List which has unfortunately been mothballed since 1996. As most readers of this magazine have access to at least one of the online databases, I'll not cover them here. These systems have as their predecessor KG's list. This was compiled from non-weekly tournament results from medium to large events conducted around the world over several years. The last compilation occurred in July 1996. Only players active in the previous 3 years were included. To be listed in the top 100, a player had to have 1000 or more lifetime experience points. The top ten players from that final listing were
Rank name experience rating match win percentage 1. Edward O'Laughlin 7741 1856 62 2. Billy Horan 3501 1831 63 3. Harry Zilli 4307 1790 61 4. Matthias Pauen 1727 1786 63 5. Mika Inkinen 1475 1783 71 6. David Wells 1276 1774 66 7. Mike Svobodny 4553 1772 55 8. Ray Glaeser 4151 1770 57 9. Hugh Sconyers 1804 1768 57 10. Evert Van Eijck 1815 1760 63Most of these names are easily recognized by those who play or read up on the results of big tournaments. We will come back to this list later when the Giant 32 system is discussed. Those familiar with the online server ratings will notice that the top player KG rating is 200-300 lower than the top online ratings. This is likely due to the tighter spread of skill levels for face-to-face tournaments.
Note that local clubs can (and some do) keep similar ratings
systems, so size isn't an impediment to setting up a Kaufman system.
Without the aid of automated scoring, however, the amount of work
required is significant.
II. Earnings SystemThis type of ranking/scoring system awards points for high finishes in tournaments. These are quite common in local clubs. Some grades:
One Earnings system worthy of note is Bill Davis's American
Backgammon Tour (ABT). Now in its 4th year and growing, this
competition is entered by attendance at several weekend (regional)
events around the US. Bill also keeps a lifetime list. You
can view the 1999 standings as well as the lifetime rankings at
the Chicago Point WWW page (http://www.chicagopoint.com/abt.html).
III. SurveysThis is similar to an election. Ballots are made available (sometimes to a restricted set of people) and players are ranked individually by each voter. Points are assigned based on those rankings, and a final ranking is tallied.
A well known survey is the Giant 32 of Backgammon conducted by Yamin Yamin with behind the scenes help from Carol Joy Cole, John Stryker, Jake Jacobs, and Howard Ring. Ballots contain 32 slots and the voter is given a ranking of 2, no exceptions! This survey has been taken every other year, but apparently it is being discontinued as well. It has not gone unnoticed in the "Old World" that recent rankings have been highly biased towards US players. Again, this appears directly related to the distribution of ballots (or maybe fairly stated, to the return of ballots). The 1997 Giant 32 Top 10 (from Flint Area BackgammoNews #217, Mar/Apr 1998):
Rank Player Points Number of ballots- KG Rating list 1st place votes ranking (1996) (out of 57 possible) 1. Wilcox Snellings 1548 56-12 12 2. Mike Senkiewicz 1423 54-10 15 3. Nack Ballard 1211 50-7 41 4. Mike Svobodny 1170 53-0 7 5. Paul Magriel 1115 52-2 >100 6. Neil Kazaross 1082 49-2 16 7. Billy Horan 1077 49-5 2 8. Kit Woolsey 1048 51-1 27 9. Jerry Grandell 864 40-7 72 10. Bill Robertie 840 47-0 97Grandell was the only player of the first 14 from outside the US. In defense of the promoters, I point out that besides tournament success, voters were asked to include money play prowess and intangibles such as fear instilled in opponents.
IV. Play-by-Play JudgingThis method is the newest of the systems I cover. It has only been available for a little over a year. The commercially available backgammon software, Snowie by Oasya, is not only a world-class player in its own right but also has the feature that it will analyze an entire match and quantitatively assign grades to all checker plays and cube decisions (including lack of cube decisions).
One big advantage of this system is that it quantifies luck, and removes it from the results. A player is judged solely on his/her performance with the actual dice rolls. If you play a poor roll well, that is much better than playing a joker with mediocrity.
Harald Johanni, editor of Backgammon Magazin, has recently
published a rating list based upon Snowie evaluation of match
play. This list has some inherent weaknesses. The main problem
is its (non-)universality which can be seen from the distribution
of matches used in making up the list. Part of this is likely
due to regionalityJohanni is European (as is most of his
readership?) and he receives results primarily from European
tournaments. Also, Johanni only accepts hand recorded match
transcripts (as opposed to online computer recordings). I'm not
sure of the reason but it is possibly due to an integrity
question. (For example, I record only the matches I do well in
and send those to Harald.) Anyway, I'm not sure but it is
possible that promoters/followers of European events are more
diligent in recording big matches, so these understandably
would get preferentially included in the system. In any case,
this system is likely to become the standard for the future and
improvements are likely.
A Case Study: Jellyfish v3.0 Level-7 vs. Chuck BowerI have limited data (but hopefully that is better than none at all) which allows a cross-comparison of some of these systems. Recently I played a series of 54 matches to 7 points against Jellyfish version 3.0 at its strongest playing level. There are some tantalizing hints about ratings/rankings systems and backgammon competition in general than can be speculated upon based on this sample. However, more study is required to beef up the statistics before strong conclusions can be drawn.
Before giving detailed results of that study, I give some figures on how these two players rate/rank in some of the above systems:
I. FIBS Ratings (Kaufman System)
Player ranking rating experience Chuck Bower (c_ray) 656 1743.90 741 jellyfish (from Dec 1997) 3 2037.68 29,548
For a 7-point match, the FIBS ratings formula predicts JF should have a winning probability of 71%.
II. Earnings System
(There is no comparable data on me and The Fish for this kind of system.)
Neither of us made the 1997 Giant 64 (the 1997 Giant 32 balloting results listing continued through the top 64 players). Jellyfish played 200 game money sessions with each of Mike Senkiewicz and Nack Ballard in a well documented event conducted by Malcolm Davis a couple summers ago. JF broke exactly even in the 400 games. Senkiewicz ranked 2nd and Ballard 3rd in the 1997 Giant 32. My lifetime record against the 1997 Giant 64 is 10 wins, 11 losses.
IV. Play-by-Play JudgingThe only data I have on Jellyfish play analyzed by Snowie is for the 54 matches of this case study. For myself against other opponents, I have analyzed only 8 matches from FIBS tournament play. For this small sample, my overall error rate is 0.00605 and my checker play error rate is 0.00753. The checker play error rate is the average error per non-forced move in cubeless equity units. The overall error rate is similar but includes doubling cube errors as well as checker play errors. The overall error rate would put me in 26th place in Johanni's ranking system for issue 1996/II of Backgammon Magazin. (I don't for a minute believe I am even close to 26th best in the world. This is just for comparison sake and will be discussed later.)
V. Head-to-Head Money Play
Over the past few years I have played Jellyfish several hundred games at money style play. A summary of those sessions:
opponent total games played net ppg vs. JF1 level-7 1620 -0.13 vs. JF2 level-7 820 -0.40 vs. JF3 level-7 107 -0.26
VII. Results of 54 Matches
My record in the 54 7-point matches was 26 wins and 28 losses. According to Snowie analysis, my overall play error rate was 0.00833 and my checker play error rate was 0.00941. Jellyfish played extremely well according to Snowie: overall error rate was 0.00205 with a checker play error rate of 0.00215. Also, Snowie said I had more luck at the rate 0.00150. (Again, the units here are cubeless equity per dice roll.) We can compare these error rating numbers to Johanni's rating list. Jellyfish would be rated number 1 in both overall error rate and checker play error rate. Furthermore, its competition isn't close. Best (in the 1999/II issue) was 0.00371 total error rate and 0.00449 checker play error rate. My performance would rank me 69th overall, and 70th (out of the 108 listed) in checker play.
There appear to be several inconsistencies in this study:
What is going on here? I have some ideas, but I'm not really sure. Here is some brainstorming:
The last point is one of the more interesting speculations. Does a player perform better (in Snowie's view) against weaker opposition? One way to think of this is: Do strong opponents create tougher decisions?
Finally, there is one more tidbit which I have yet to reveal. Of the 54 matches, the luckier player won 52 and lost 2! Of my 26 wins, I was luckier than JF in all of the matches. For JF's 28 wins, it was luckier in 26 and I was luckier in 2. So Jellyfish's considerable skill advantage only netted two match wins. Is this evidence that backgammon is mostly luck? Something to think about!