Having been a longtime chess-player and local chess club office-holder, I
once wrote a ratings program for the Tacoma (Washington, USA) Chess Club.
This entailed a rather involved exploration of many rating and
ranking systems (i.e., Ingo, British Grading, Swiss) used by various
sporting associations and performance institutes. As it happens, I
selected the Elo system for my program. This is the same system that is
used by FIDE (World Chess Federation), USCF (US Chess Federation), and --
I believe -- FIBS. The remarks that follow are based on my (strictly
informal and casual) research, the experience in writing the program, and
the impressions I gathered from speaking with Prof. Arpad Elo shortly
before his death several years ago. Please consider them off-the-cuff,
and open to free criticism or revision by anyone with a better grasp of
the subject matter. I am extensive only as it might help those who are
unfamiliar with Elo's remarkable system. Others, please excuse my
self-indulgence in the next couple of paragraphs!
The Elo system is based wholly on statistical probability theory. Simply
put, it is a universal system for the determining of outcomes in which
opponents or teams face each other in direct competition. (Those
interested might consult Elo's "The Rating of Chess Players, Past and
Present", or Elo's original monograph in the Journal of Gerontology,
1965). The Elo system has been used by the USCF since 1960 and was
adopted by FIDE as the national and international standard in 1970.
Elo's concept was straightforward -- quite unlike the complex math it
involves: humans have good days and bad, and their performances in a
given situation will fluctuate accordingly. Any two or more people who
compete in a given activity over time will, over time, yield a record of
their performances. These recorded results can be used to yield a
probabilistic table of winning expectancies. Inversely, this can be used
to derive "ratings", or the probable winning expectancy of a given
competitor in a given field. Elo combined the principles of standard
deviation and normal distribution (familiar to any insurance company)
with his numerical interval scale to develop a system by which many
performances of an individual would be normally distributed. That is to
say that the competitor would have a scoring probability that could be
converted to a rating differential. What this rating means to us, of
course, is that a player who was rated 2000, when competing with an 1800,
should win about 75% of the time. A cornerstone of the Elo theory is
that "class" or "category" intervals follow the statistical concept of
standard deviation in single games. What this means to us is that, while
a player competing in a group consisting of those with similar ratings
would find relatively even competition, there is a quantifiable point at
which the the difference in ratings would find him either being clearly
outclassed or outclassing his opponents. Thus, for us chessplayers, the
"class prize" became statistically valid, allowing players in the same
numerical rating-group to compete with reasonable chances. Thus all
"Class A" players (ratings from 1800-1999) might compete for the "A"
prize, while "Experts" (2000-2199) would fight for the "Expert" prize.
Even in competitions like open-Swiss system pairings, where the strength
of their opposition in individual games would be randomized, the players
would compete for the highest score within their own class,
It is recognized that, given any general rating 'pool', the improvement
of young players (and others new to the competition) will be rapid, while
established competitors would see their rating stabilize for a long
period, then slowly decline. For either group, however, ratings seldom
decline to the point that they were when the player entered the pool.
This tends to "inflate" the rating pool, skewing the results, after
awhile, in the direction of change. This unpleasant situation would make
it impossible to compare a player from one era to another, or to see a
rating stabilize at a "true" level, for purposes of performance
evaluation. Fortunately, there are a number of remedies, most (if not
all) of which have been employed by the USCF, among others:
1. Use of 'provisional' ratings to recognize players whose youth or
introduction to the game would signal a rapid improvement of results.
2. Use of a multiplier to enhance or degrade a given performance
grouping. This "K-coefficient" (as Elo describes it) can be used to
accerlerate improving players to their proper location in the rating
field, while a lower "K" would be used for older or better-established
players.
3. Periodic adjustment of the entire pool, as the British Chess
Federation does (or did, anyway, when I was up on all this).
4. The award of bonus or feedback points, whose purpose would be to
compensate or handicap players who encounter those with unstabilized or
less-documented performance tracks. (Like giving a Master player a 10%
return in rating points if he were to be defeated in a match by a new,
talented player).
5. Regardless of the above, establishing a Ratings Director, who would
monitor the systematic drift of the rating pool and maintain a 'control
group' of players, whose ratings would be used as a baseline for
implementing appropriate adjustments. This method is standard for all
well-established rating systems.
Frankly, I know little of the Internet Backgammon Server, but I am sure
they already have their pool-inflation under control. I tracked
undesired rating-pool inflation and wrote a compensating routine that
used the FIDE standard for K-coefficients and monitored all provisional
ratings for 30 games before issuing established ratings and modifying the
K-values. Testing it against grandmaster result-tables from Sarajevo,
Moscow, and the International Chess Olympiad, the final version
paralleled FIDE within a single percentage point. You know the funny
thing, Stephan? I only wrote the thing to rate after-hours pinochle
games at the chess club! :) I only add this to show that if I could do
that program on an old Commodore-64, they can (and probably are) doing
wonders now, assuming they have a ratings director.
Well, so much for all that. I hope I haven't bored you with too many
non-essentials, and that the final segments answered your questions about
ratings-inflation. In closing, I want to add that the ONLY reason I
posted this was that, after having my wife dig out all my old flowcharts
and notes and books, she said more than one person had better read it!:)
seeker@accessone.com
|