Backgammon Ratings

Ratings

ELO system

From:   seeker
Address:   seeker@accessone.com
Date:   5 November 1995
Subject:   Re: rating inflation
Forum:   rec.games.backgammon
Google:   47il89$4q3@news.accessone.com

Having been a longtime chess-player and local chess club office-holder, I once wrote a ratings program for the Tacoma (Washington, USA) Chess Club. This entailed a rather involved exploration of many rating and ranking systems (i.e., Ingo, British Grading, Swiss) used by various sporting associations and performance institutes. As it happens, I selected the Elo system for my program. This is the same system that is used by FIDE (World Chess Federation), USCF (US Chess Federation), and -- I believe -- FIBS. The remarks that follow are based on my (strictly informal and casual) research, the experience in writing the program, and the impressions I gathered from speaking with Prof. Arpad Elo shortly before his death several years ago. Please consider them off-the-cuff, and open to free criticism or revision by anyone with a better grasp of the subject matter. I am extensive only as it might help those who are unfamiliar with Elo's remarkable system. Others, please excuse my self-indulgence in the next couple of paragraphs! The Elo system is based wholly on statistical probability theory. Simply put, it is a universal system for the determining of outcomes in which opponents or teams face each other in direct competition. (Those interested might consult Elo's "The Rating of Chess Players, Past and Present", or Elo's original monograph in the Journal of Gerontology, 1965). The Elo system has been used by the USCF since 1960 and was adopted by FIDE as the national and international standard in 1970. Elo's concept was straightforward -- quite unlike the complex math it involves: humans have good days and bad, and their performances in a given situation will fluctuate accordingly. Any two or more people who compete in a given activity over time will, over time, yield a record of their performances. These recorded results can be used to yield a probabilistic table of winning expectancies. Inversely, this can be used to derive "ratings", or the probable winning expectancy of a given competitor in a given field. Elo combined the principles of standard deviation and normal distribution (familiar to any insurance company) with his numerical interval scale to develop a system by which many performances of an individual would be normally distributed. That is to say that the competitor would have a scoring probability that could be converted to a rating differential. What this rating means to us, of course, is that a player who was rated 2000, when competing with an 1800, should win about 75% of the time. A cornerstone of the Elo theory is that "class" or "category" intervals follow the statistical concept of standard deviation in single games. What this means to us is that, while a player competing in a group consisting of those with similar ratings would find relatively even competition, there is a quantifiable point at which the the difference in ratings would find him either being clearly outclassed or outclassing his opponents. Thus, for us chessplayers, the "class prize" became statistically valid, allowing players in the same numerical rating-group to compete with reasonable chances. Thus all "Class A" players (ratings from 1800-1999) might compete for the "A" prize, while "Experts" (2000-2199) would fight for the "Expert" prize. Even in competitions like open-Swiss system pairings, where the strength of their opposition in individual games would be randomized, the players would compete for the highest score within their own class, It is recognized that, given any general rating 'pool', the improvement of young players (and others new to the competition) will be rapid, while established competitors would see their rating stabilize for a long period, then slowly decline. For either group, however, ratings seldom decline to the point that they were when the player entered the pool. This tends to "inflate" the rating pool, skewing the results, after awhile, in the direction of change. This unpleasant situation would make it impossible to compare a player from one era to another, or to see a rating stabilize at a "true" level, for purposes of performance evaluation. Fortunately, there are a number of remedies, most (if not all) of which have been employed by the USCF, among others: 1. Use of 'provisional' ratings to recognize players whose youth or introduction to the game would signal a rapid improvement of results. 2. Use of a multiplier to enhance or degrade a given performance grouping. This "K-coefficient" (as Elo describes it) can be used to accerlerate improving players to their proper location in the rating field, while a lower "K" would be used for older or better-established players. 3. Periodic adjustment of the entire pool, as the British Chess Federation does (or did, anyway, when I was up on all this). 4. The award of bonus or feedback points, whose purpose would be to compensate or handicap players who encounter those with unstabilized or less-documented performance tracks. (Like giving a Master player a 10% return in rating points if he were to be defeated in a match by a new, talented player). 5. Regardless of the above, establishing a Ratings Director, who would monitor the systematic drift of the rating pool and maintain a 'control group' of players, whose ratings would be used as a baseline for implementing appropriate adjustments. This method is standard for all well-established rating systems. Frankly, I know little of the Internet Backgammon Server, but I am sure they already have their pool-inflation under control. I tracked undesired rating-pool inflation and wrote a compensating routine that used the FIDE standard for K-coefficients and monitored all provisional ratings for 30 games before issuing established ratings and modifying the K-values. Testing it against grandmaster result-tables from Sarajevo, Moscow, and the International Chess Olympiad, the final version paralleled FIDE within a single percentage point. You know the funny thing, Stephan? I only wrote the thing to rate after-hours pinochle games at the chess club! :) I only add this to show that if I could do that program on an old Commodore-64, they can (and probably are) doing wonders now, assuming they have a ratings director. Well, so much for all that. I hope I haven't bored you with too many non-essentials, and that the final segments answered your questions about ratings-inflation. In closing, I want to add that the ONLY reason I posted this was that, after having my wife dig out all my old flowcharts and notes and books, she said more than one person had better read it!:) seeker@accessone.com

Did you find the information in this article useful?

Do you have any comments you'd like to add?

Ratings

Constructing a ratings system (Matti Rinta-Nikkola, Dec 1998)

Converting to points-per-game (David Montgomery, Aug 1998) [Recommended reading]

Cube error rates (Joe Russell+, July 2009) [Long message]

Different length matches (Jim Williams+, Oct 1998)

Different length matches (Tom Keith, May 1998) [Recommended reading]

ELO system (seeker, Nov 1995)

Effect of droppers on ratings (Gary Wong+, Feb 1998)

Emperical analysis (Gary Wong, Oct 1998)

Error rates (David Levy, July 2009)

Experience required for accurate rating (Jon Brown+, Nov 2002)

FIBS rating distribution (Gary Wong, Nov 2000)

FIBS rating formula (Patti Beadles, Dec 2003)

FIBS vs. GamesGrid ratings (Raccoon+, Mar 2006) [GammOnLine forum]

Fastest way to improve your rating (Backgammon Man+, May 2004)

Field size and ratings spread (Daniel Murphy+, June 2000) [Long message]

Improving the rating system (Matti Rinta-Nikkola, Nov 2000) [Long message]

KG rating list (Daniel Murphy, Feb 2006) [GammOnLine forum]

KG rating list (Tapio Palmroth, Oct 2002)

MSN Zone ratings flaw (Hank Youngerman, May 2004)

No limit to ratings (David desJardins+, Dec 1998)

On different sites (Bob Newell+, Apr 2004)

Opponent's strength (William Hill+, Apr 1998)

Possible adjustments (Christopher Yep+, Oct 1998)

Rating versus error rate (Douglas Zare, July 2006) [GammOnLine forum]

Ratings and rankings (Chuck Bower, Dec 1997) [Long message]

Ratings and rankings (Jim Wallace, Nov 1997)

Ratings on Gamesgrid (Gregg Cattanach, Dec 2001)

Ratings variation (Kevin Bastian+, Feb 1999)

Ratings variation (FLMaster39+, Aug 1997)

Ratings variation (Ed Rybak+, Sept 1994)

Strange behavior with large rating difference (Ron Karr, May 1996)

Table of ratings changes (Patti Beadles, Aug 1994)

Table of win rates (William C. Bitting, Aug 1995)

Unbounded rating theorem (David desJardins+, Dec 1998)

What are rating points? (Lou Poppler, Apr 1995)

Why high ratings for one-point matches? (David Montgomery, Sept 1995)

[GammOnLine forum] From GammOnLine [Long message] Long message [Recommended reading] Recommended reading [Recent addition] Recent addition

Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

Return to: Backgammon Galore : Forum Archive Main Page