Forum Archive :
Field size and ratings spread
It seems commonsensical that the more players within a ratings system,
the larger the spread will be tend to be from top to bottom.
For instance, oldtime FIBSters remember when 1800 really meant
Rating Currently Highest Lowest
System Rated Players (difference from start rating)
Norway: 126 +256 -300
BIBA: 194 +288 -396
Sweden: 369 +356 -192
Denmark: 1042 +264 -373
GamesGrid: 2402 +543 -550
FIBS: 6769 +768* -1170**
Netgammon ? ? ?
How can this effect be quantified?
*2nd highest is+580
**2nd lowest is -845
Nis Jørgensen writes:
However commensensical it might be, it is not in line with the
workings of the rating system. This system is designed to
guarantee, that a difference of 200 points is a difference of 200
points - no matter the number of players. The numbers you give
for the internet servers are not really interesting - since the
lowest ranked players are certainly (ill-programmed) bots. A more
ionteresting number would be the average deviation of the
rankings. Please note, that an internet server, and probably also
a large national system like the Danish one, will probably
attract more weak players.
> For instance, oldtime FIBSters remember when 1800 really meant
> something :)
This has to do with inflation IMHO (the average rating rising),
not with the rating differences.
Gary Wong writes:
Well, I think you're both right here. Yes, the expected difference
between two players should (by and large) be equivalent regardless of
the number of players in the system, but it is perfectly normal to
find larger differences between the extremes with larger samples.
(After all, if you rate 10 randomly selected backgammon players,
chances are the best player of the 10 will be somewhat better than the
worst, but not by a huge amount. But if you rate every player in the
world, you'll be able to measure the difference between the world
champ and somebody who barely knows the rules, which we would expect
to be enormous.)
> The numbers you give for the internet servers are not really
> interesting - since the lowest ranked players are certainly
> (ill-programmed) bots. A more ionteresting number would be the average
> deviation of the rankings.
That's quite possibly true, although it depends what you mean by
"interesting" :-). A better metric for the "spread" of a distribution
would be the inter-quartile range (the difference between the 25th and
75th percentiles). If we are allowed to assume that the populations
we're sampling from are equivalent (e.g. that FIBS does not attract a
different type of player than those measured in the Norweigian
system), then the expected inter-quartile ranges between each rating
system ought to be the same. Of course this assumption is unlikely to
be reasonable in practice (FIBS attracts all kinds of players from the
casual to world-class professionals, whereas national ratings mostly
consist of regular tournament players; the tournament players are more
likely to be closely matched than the FIBS ones). But the important
thing about the inter-quartile range is that its expectation is
independent of sample size.
> For instance, oldtime FIBSters remember when 1800 really meant
> something :)
Well, I think it still means more or less the same thing, depending on
how you interpret it; a rating of 1800 today means you are in the top
6% (or so) of FIBS players. When there were only 300 players, that would
put you in the top 20; now that there are going on 7000, it's only enough
to make it into the top 400.
> This has to do with inflation IMHO (the average rating rising),
> not with the rating differences.
Actually I believe the effect of inflation is rather small compared to the
other factors. The median FIBS rating at the moment is only 1528, after
8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
To get back to the original question ("How can this effect be
quantified?"), there has surely been plenty of work on the expected
maxima of samples of various distributions. I'm at work at the moment
and don't have any references handy, so I cheated and made a quick
simulation which appears to show that the expected deviation of the
maximum of n samples from a normal distribution appears to grow
slightly less than proportionally with log(n). I plotted a graph of
this expectation and superimposed Daniel's data on it; I had to assume
that backgammon ratings are normally distributed with std. dev. 150
points. I have no idea whether this assumption is reasonable or not;
in practice the FIBS/GG standard deviations are likely to be higher
than the national ratings, because they include a wider variety of
players, as described above. (Daniel, do you still have your original
samples available? It might be interesting to compute the inter-quartile
ranges and standard deviations to see how much they vary between pools
The graph is available in PostScript form at:
for those interested.
Daniel Murphy writes:
Server rating extremes on either end are subject to uncompetitive
manipulation. The lowest rating on FIBS is 649.78, the 2nd lowest
701.50. The 3rd lowest (775.72) is most definitely a bonafide human.
The 4th lowest (also human) boasts a substantially more respectable
rating of 870.16. All the lowest rated GamesGrid players appear to be
bonafide human players.
See below, where the statistics given include exclusion of 1% of
players at each end of the ratings lists.
> I believe the effect of inflation is rather small compared to the
> other factors. The median FIBS rating at the moment is only 1528,
> after 8 years of FIBS -- inflation of 3 or 4 points a year doesn't
> seem like much to me!
Agreed, and an aside: it's been mentioned in other discussions that
average, not median rating, is a better indication of ratings
inflation. Danish median is 1502.86, the average 1516.6. Norway median
is 1526.50, average is 1523.13. Calculating averages for other systems
is beyond my endurance for tedium.
Can you use these statistics, Gary?
Group # #1 75%-ile median 25%-ile lowest
FIBS 6683 2273.74 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 701.50
GG 2418 2068.09 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 957.58
DKk 992 1764.93 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1126.31
SEn 388 1879.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1173.00
BIBA 217 1781.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1102.00
NO 130 1760.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1200.00
Group # 1%-ile 75%-ile median 25%-ile 99%-ile
FIBS 6683 1911.42 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 1136.30
GG 2418 1960.75 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 1234.71
DBgF 992 1696.22 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1373.77
SBgF 388 1827.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1266.00
BIBA 217 1772.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1187.00
NBgF 130 1741.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1272.00
FIBS: excludes unknown # of players with less than 50 TMP, and lowest
GG: excludes the non-player at bottom of list
DK: excludes 51 members with 0 TMP
Sweden: includes all rated players (qualifications unknown)
BIBA: includes all rated players (qualifications unknown)
Norway: includes all listed players (i.e., minimum 15 matches and and
least 1 match played in last year).
Danish system start point is 1000, not 1500; ratings adjusted by +500
Gary Wong writes:
> It's been mentioned in other discussions that average, not median
> rating, is a better indication of ratings inflation.
True -- I tried searching for the articles about inflation that had
been posted here in the past, but unfortunately now that we have only
a "precision buying service" instead of Deja News, things like that
aren't easy to find.
Luckily we still have Tom Keith's r.g.b. archive -- one relevant article
which does seem to indicate that a FIBS rating of 1800 has been reasonably
consistent at marking the 95th percentile in 1995, 1997 and 2000.
One other snippet -- Michael Klein's latest FIBS Ratings Report shows
the mean FIBS rating to be 1534, which is surprisingly close to the median.
Thanks for those data! (I believe that the "-69.50" figure in the FIBS
25%-ile should be "-98.14".)
A few random observations:
- The inter-quartile ranges of the online servers do seem to be
significantly higher than the national ratings (~210 vs. ~170), which
supports the hypothesis that the Internet servers attract a more
varied range of players than real-life tournaments.
The Danish range is much smaller than the others, though; I have no
idea why this would be the case (perhaps the results include a large
number of relatively new players? The other descriptions make it
sound as if they do or might exclude inexperienced players.)
- The Danish, Swedish and British medians show virtually no sign of
inflation. I suspect this may be because they "include all rated
players": the main cause of inflation is that weak players are more
likely to leave the system than strong players, and so weak ratings
are gradually deleted over time which effectively raises whatever is
left behind. The Norweigian ratings (which require at least 1 match
played in the last year) show comparable inflation to FIBS.
GamesGrid shows the most inflation of all. This might well be because
the financial cost increases the tendency of weak players to leave. I
understand that GG have added points to all players' ratings in the
past when a server crash lost the results of some games (I'm not sure
which is more disturbing -- that somebody thought this was a good idea,
or that users were apparently pacified by it!) which would certainly
add to this effect.
- The results show that the distributions tend to be skewed slightly to
the right (the upper quartile is larger than the lower quartile). One
explanation for this might be that weak players tend to improve faster
than strong players (hopefully nobody's getting significantly worse!)
which could shrink the left-hand tail somewhat.
Daniel Murphy writes:
The Danish rating list includes only current, paid-up members. Members
who neglect to renew their membership are dropped from the rankings.
Ditto for GamesGrid and NBgF and, I assume, for BIBA and SBgF. Not
only because seeing one's name in the ratings list is an incentive to
remain a member, but because (as is the case in Denmark) membership in
the national federation is mandatory for residents to participate in
Open or Intermediate flights of almost all tournaments.
But several factors do limit inflation in the national ratings. No one
can drop out and then rejoin under a different identify. No ever ever
gets his rating "re-set" to par. The system never awards all players X
points. At least in Denmark, everyone new to the system starts out at
par regardless of real or estimated ability. And my impression is that
in Denmark, for example, there's a small but steady outflow of
higher-ranked players every year, as people move or give up real life
play for whatever reason -- I imagine this effect isn't so notable on
the online servers. Nis mentions another reason -- unlike all the
online systems, the Danish system has no accelerated ratings boost for
low-experience players. I believe he's correct that the Norwegian
system has adopted the exact FIBS formula, including the "boost" for
players with less than 400 TMP.
- Constructing a ratings system (Matti Rinta-Nikkola, Dec 1998)
- Converting to points-per-game (David Montgomery, Aug 1998)
- Cube error rates (Joe Russell+, July 2009)
- Different length matches (Jim Williams+, Oct 1998)
- Different length matches (Tom Keith, May 1998)
- ELO system (seeker, Nov 1995)
- Effect of droppers on ratings (Gary Wong+, Feb 1998)
- Emperical analysis (Gary Wong, Oct 1998)
- Error rates (David Levy, July 2009)
- Experience required for accurate rating (Jon Brown+, Nov 2002)
- FIBS rating distribution (Gary Wong, Nov 2000)
- FIBS rating formula (Patti Beadles, Dec 2003)
- FIBS vs. GamesGrid ratings (Raccoon+, Mar 2006)
- Fastest way to improve your rating (Backgammon Man+, May 2004)
- Field size and ratings spread (Daniel Murphy+, June 2000)
- Improving the rating system (Matti Rinta-Nikkola, Nov 2000)
- KG rating list (Daniel Murphy, Feb 2006)
- KG rating list (Tapio Palmroth, Oct 2002)
- MSN Zone ratings flaw (Hank Youngerman, May 2004)
- No limit to ratings (David desJardins+, Dec 1998)
- On different sites (Bob Newell+, Apr 2004)
- Opponent's strength (William Hill+, Apr 1998)
- Possible adjustments (Christopher Yep+, Oct 1998)
- Rating versus error rate (Douglas Zare, July 2006)
- Ratings and rankings (Chuck Bower, Dec 1997)
- Ratings and rankings (Jim Wallace, Nov 1997)
- Ratings on Gamesgrid (Gregg Cattanach, Dec 2001)
- Ratings variation (Kevin Bastian+, Feb 1999)
- Ratings variation (FLMaster39+, Aug 1997)
- Ratings variation (Ed Rybak+, Sept 1994)
- Strange behavior with large rating difference (Ron Karr, May 1996)
- Table of ratings changes (Patti Beadles, Aug 1994)
- Table of win rates (William C. Bitting, Aug 1995)
- Unbounded rating theorem (David desJardins+, Dec 1998)
- What are rating points? (Lou Poppler, Apr 1995)
- Why high ratings for one-point matches? (David Montgomery, Sept 1995)