Ratings

Forum Archive : Ratings

 
ELO system

From:   seeker
Address:   seeker@accessone.com
Date:   5 November 1995
Subject:   Re: rating inflation
Forum:   rec.games.backgammon
Google:   47il89$4q3@news.accessone.com

Having been a longtime chess-player and local chess club office-holder, I
once wrote a ratings program for the Tacoma (Washington, USA) Chess Club.
 This entailed a rather involved exploration of many rating and
ranking systems (i.e., Ingo, British Grading, Swiss) used by various
sporting associations and performance institutes.  As it happens, I
selected the Elo system for my program.  This is the same system that is
used by FIDE (World Chess Federation), USCF (US Chess Federation), and --
I believe -- FIBS.  The remarks that follow are based on my (strictly
informal and casual) research, the experience in writing the program, and
the impressions I gathered from speaking with Prof. Arpad Elo shortly
before his death several years ago.  Please consider them off-the-cuff,
and open to free criticism or revision by anyone with a better grasp of
the subject matter.  I am extensive only as it might help those who are
unfamiliar with Elo's remarkable system.  Others, please excuse my
self-indulgence in the next couple of paragraphs!

The Elo system is based wholly on statistical probability theory.  Simply
put, it is a universal system for the determining of outcomes in which
opponents or teams face each other in direct competition.  (Those
interested might consult Elo's "The Rating of Chess Players, Past and
Present", or Elo's original monograph in the Journal of Gerontology,
1965).  The Elo system has been used by the USCF since 1960 and was
adopted by FIDE as the national and international standard in 1970.

Elo's concept was straightforward -- quite unlike the complex math it
involves: humans have good days and bad, and their performances in a
given situation will fluctuate accordingly.  Any two or more people who
compete in a given activity over time will, over time, yield a record of
their performances.  These recorded results can be used to yield a
probabilistic table of winning expectancies.  Inversely, this can be used
to derive "ratings", or the probable winning expectancy of a given
competitor in a given field.  Elo combined the principles of standard
deviation and normal distribution (familiar to any insurance company)
with his numerical interval scale to develop a system by which many
performances of an individual would be normally distributed.  That is to
say that the competitor would have a scoring probability that could be
converted to a rating differential.  What this rating means to us, of
course, is that a player who was rated 2000, when competing with an 1800,
should win about 75% of the time.  A cornerstone of the Elo theory is
that "class" or "category" intervals follow the statistical concept of
standard deviation in single games.  What this means to us is that, while
a player competing in a group consisting of those with similar ratings
would find relatively even competition, there is a quantifiable point at
which the the difference in ratings would find him either being clearly
outclassed or outclassing his opponents.  Thus, for us chessplayers, the
"class prize" became statistically valid, allowing players in the same
numerical rating-group to compete with reasonable chances.  Thus all
"Class A" players (ratings from 1800-1999) might compete for the "A"
prize, while "Experts" (2000-2199) would fight for the "Expert" prize.
Even in competitions like open-Swiss system pairings, where the strength
of their opposition in individual games would be randomized, the players
would compete for the highest score within their own class,

It is recognized that, given any general rating 'pool', the improvement
of young players (and others new to the competition) will be rapid, while
established competitors would see their rating stabilize for a long
period, then slowly decline.  For either group, however, ratings seldom
decline to the point that they were when the player entered the pool.
This tends to "inflate" the rating pool, skewing the results, after
awhile, in the direction of change.  This unpleasant situation would make
it impossible to compare a player from one era to another, or to see a
rating stabilize at a "true" level, for purposes of performance
evaluation.  Fortunately, there are a number of remedies, most (if not
all) of which have been employed by the USCF, among others:

1. Use of 'provisional' ratings to recognize players whose youth or
introduction to the game would signal a rapid improvement of results.

2. Use of a multiplier to enhance or degrade a given performance
grouping.  This "K-coefficient" (as Elo describes it) can be used to
accerlerate improving players to their proper location in the rating
field, while a lower "K" would be used for older or better-established
players.

3. Periodic adjustment of the entire pool, as the British Chess
Federation does (or did, anyway, when I was up on all this).

4. The award of bonus or feedback points, whose purpose would be to
compensate or handicap players who encounter those with unstabilized or
less-documented performance tracks.  (Like giving a Master player a 10%
return in rating points if he were to be defeated in a match by a new,
talented player).

5. Regardless of the above, establishing a Ratings Director, who would
monitor the systematic drift of the rating pool and maintain a 'control
group' of players, whose ratings would be used as a baseline for
implementing appropriate adjustments.  This method is standard for all
well-established rating systems.

Frankly, I know little of the Internet Backgammon Server, but I am sure
they already have their pool-inflation under control.  I tracked
undesired rating-pool inflation and wrote a compensating routine that
used the FIDE standard for K-coefficients and monitored all provisional
ratings for 30 games before issuing established ratings and modifying the
K-values.  Testing it against grandmaster result-tables from Sarajevo,
Moscow, and the International Chess Olympiad, the final version
paralleled FIDE within a single percentage point.  You know the funny
thing, Stephan?  I only wrote the thing to rate after-hours pinochle
games at the chess club! :)  I only add this to show that if I could do
that program on an old Commodore-64, they can (and probably are) doing
wonders now, assuming they have a ratings director.

Well, so much for all that.  I hope I haven't bored you with too many
non-essentials, and that the final segments answered your questions about
ratings-inflation.  In closing, I want to add that the ONLY reason I
posted this was that, after having my wife dig out all my old flowcharts
and notes and books, she said more than one person had better read it!:)

seeker@accessone.com
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Ratings

Constructing a ratings system  (Matti Rinta-Nikkola, Dec 1998) 
Converting to points-per-game  (David Montgomery, Aug 1998)  [Recommended reading]
Cube error rates  (Joe Russell+, July 2009)  [Long message]
Different length matches  (Jim Williams+, Oct 1998) 
Different length matches  (Tom Keith, May 1998)  [Recommended reading]
ELO system  (seeker, Nov 1995) 
Effect of droppers on ratings  (Gary Wong+, Feb 1998) 
Emperical analysis  (Gary Wong, Oct 1998) 
Error rates  (David Levy, July 2009) 
Experience required for accurate rating  (Jon Brown+, Nov 2002) 
FIBS rating distribution  (Gary Wong, Nov 2000) 
FIBS rating formula  (Patti Beadles, Dec 2003) 
FIBS vs. GamesGrid ratings  (Raccoon+, Mar 2006)  [GammOnLine forum]
Fastest way to improve your rating  (Backgammon Man+, May 2004) 
Field size and ratings spread  (Daniel Murphy+, June 2000)  [Long message]
Improving the rating system  (Matti Rinta-Nikkola, Nov 2000)  [Long message]
KG rating list  (Daniel Murphy, Feb 2006)  [GammOnLine forum]
KG rating list  (Tapio Palmroth, Oct 2002) 
MSN Zone ratings flaw  (Hank Youngerman, May 2004) 
No limit to ratings  (David desJardins+, Dec 1998) 
On different sites  (Bob Newell+, Apr 2004) 
Opponent's strength  (William Hill+, Apr 1998) 
Possible adjustments  (Christopher Yep+, Oct 1998) 
Rating versus error rate  (Douglas Zare, July 2006)  [GammOnLine forum]
Ratings and rankings  (Chuck Bower, Dec 1997)  [Long message]
Ratings and rankings  (Jim Wallace, Nov 1997) 
Ratings on Gamesgrid  (Gregg Cattanach, Dec 2001) 
Ratings variation  (Kevin Bastian+, Feb 1999) 
Ratings variation  (FLMaster39+, Aug 1997) 
Ratings variation  (Ed Rybak+, Sept 1994) 
Strange behavior with large rating difference  (Ron Karr, May 1996) 
Table of ratings changes  (Patti Beadles, Aug 1994) 
Table of win rates  (William C. Bitting, Aug 1995) 
Unbounded rating theorem  (David desJardins+, Dec 1998) 
What are rating points?  (Lou Poppler, Apr 1995) 
Why high ratings for one-point matches?  (David Montgomery, Sept 1995) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page