Backgammon Ratings

Ratings

Field size and ratings spread

From:   Daniel Murphy
Address:   raccoon@best.com
Date:   30 June 2000
Subject:   Relation between field size and ratings spread?
Forum:   rec.games.backgammon
Google:   395c2ea2.45031313@news.cybercity.dk

It seems commonsensical that the more players within a ratings system, the larger the spread will be tend to be from top to bottom. For instance, oldtime FIBSters remember when 1800 really meant something :) Currently, Rating Currently Highest Lowest System Rated Players (difference from start rating) Norway: 126 +256 -300 BIBA: 194 +288 -396 Sweden: 369 +356 -192 Denmark: 1042 +264 -373 GamesGrid: 2402 +543 -550 FIBS: 6769 +768* -1170** Netgammon ? ? ? How can this effect be quantified? *2nd highest is+580 **2nd lowest is -845

Nis Jørgensen writes:

However commensensical it might be, it is not in line with the workings of the rating system. This system is designed to guarantee, that a difference of 200 points is a difference of 200 points - no matter the number of players. The numbers you give for the internet servers are not really interesting - since the lowest ranked players are certainly (ill-programmed) bots. A more ionteresting number would be the average deviation of the rankings. Please note, that an internet server, and probably also a large national system like the Danish one, will probably attract more weak players. > For instance, oldtime FIBSters remember when 1800 really meant > something :) This has to do with inflation IMHO (the average rating rising), not with the rating differences.

Gary Wong writes:

Well, I think you're both right here. Yes, the expected difference between two players should (by and large) be equivalent regardless of the number of players in the system, but it is perfectly normal to find larger differences between the extremes with larger samples. (After all, if you rate 10 randomly selected backgammon players, chances are the best player of the 10 will be somewhat better than the worst, but not by a huge amount. But if you rate every player in the world, you'll be able to measure the difference between the world champ and somebody who barely knows the rules, which we would expect to be enormous.) > The numbers you give for the internet servers are not really > interesting - since the lowest ranked players are certainly > (ill-programmed) bots. A more ionteresting number would be the average > deviation of the rankings. That's quite possibly true, although it depends what you mean by "interesting" :-). A better metric for the "spread" of a distribution would be the inter-quartile range (the difference between the 25th and 75th percentiles). If we are allowed to assume that the populations we're sampling from are equivalent (e.g. that FIBS does not attract a different type of player than those measured in the Norweigian system), then the expected inter-quartile ranges between each rating system ought to be the same. Of course this assumption is unlikely to be reasonable in practice (FIBS attracts all kinds of players from the casual to world-class professionals, whereas national ratings mostly consist of regular tournament players; the tournament players are more likely to be closely matched than the FIBS ones). But the important thing about the inter-quartile range is that its expectation is independent of sample size. > For instance, oldtime FIBSters remember when 1800 really meant > something :) Well, I think it still means more or less the same thing, depending on how you interpret it; a rating of 1800 today means you are in the top 6% (or so) of FIBS players. When there were only 300 players, that would put you in the top 20; now that there are going on 7000, it's only enough to make it into the top 400. > This has to do with inflation IMHO (the average rating rising), > not with the rating differences. Actually I believe the effect of inflation is rather small compared to the other factors. The median FIBS rating at the moment is only 1528, after 8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much to me! To get back to the original question ("How can this effect be quantified?"), there has surely been plenty of work on the expected maxima of samples of various distributions. I'm at work at the moment and don't have any references handy, so I cheated and made a quick simulation which appears to show that the expected deviation of the maximum of n samples from a normal distribution appears to grow slightly less than proportionally with log(n). I plotted a graph of this expectation and superimposed Daniel's data on it; I had to assume that backgammon ratings are normally distributed with std. dev. 150 points. I have no idea whether this assumption is reasonable or not; in practice the FIBS/GG standard deviations are likely to be higher than the national ratings, because they include a wider variety of players, as described above. (Daniel, do you still have your original samples available? It might be interesting to compute the inter-quartile ranges and standard deviations to see how much they vary between pools of players.) The graph is available in PostScript form at: http://www.cs.arizona.edu/~gary/backgammon/spread.ps for those interested.

Daniel Murphy writes:

Server rating extremes on either end are subject to uncompetitive manipulation. The lowest rating on FIBS is 649.78, the 2nd lowest 701.50. The 3rd lowest (775.72) is most definitely a bonafide human. The 4th lowest (also human) boasts a substantially more respectable rating of 870.16. All the lowest rated GamesGrid players appear to be bonafide human players. See below, where the statistics given include exclusion of 1% of players at each end of the ratings lists. > I believe the effect of inflation is rather small compared to the > other factors. The median FIBS rating at the moment is only 1528, > after 8 years of FIBS -- inflation of 3 or 4 points a year doesn't > seem like much to me! Agreed, and an aside: it's been mentioned in other discussions that average, not median rating, is a better indication of ratings inflation. Danish median is 1502.86, the average 1516.6. Norway median is 1526.50, average is 1523.13. Calculating averages for other systems is beyond my endurance for tedium. Can you use these statistics, Gary? Group # #1 75%-ile median 25%-ile lowest FIBS 6683 2273.74 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 701.50 GG 2418 2068.09 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 957.58 DKk 992 1764.93 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1126.31 SEn 388 1879.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1173.00 BIBA 217 1781.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1102.00 NO 130 1760.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1200.00 Group # 1%-ile 75%-ile median 25%-ile 99%-ile FIBS 6683 1911.42 1640.40 (+111.76) 1528.64 1430.50 (-69.50) 1136.30 GG 2418 1960.75 1711.68 (+132.64) 1579.04 1491.28 (-87.76) 1234.71 DBgF 992 1696.22 1555.22 (+ 52.36) 1502.86 1473.80 (-29.06) 1373.77 SBgF 388 1827.00 1605.00 (+ 99.00) 1506.00 1428.00 (-78.00) 1266.00 BIBA 217 1772.00 1597.00 (+ 92.00) 1505.00 1432.00 (-73.00) 1187.00 NBgF 130 1741.00 1618.00 (+ 91.50) 1526.50 1435.00 (-91.50) 1272.00 FIBS: excludes unknown # of players with less than 50 TMP, and lowest ranked "player." GG: excludes the non-player at bottom of list DK: excludes 51 members with 0 TMP Sweden: includes all rated players (qualifications unknown) BIBA: includes all rated players (qualifications unknown) Norway: includes all listed players (i.e., minimum 15 matches and and least 1 match played in last year). Danish system start point is 1000, not 1500; ratings adjusted by +500 for comparison.

Gary Wong writes:

> It's been mentioned in other discussions that average, not median > rating, is a better indication of ratings inflation. True -- I tried searching for the articles about inflation that had been posted here in the past, but unfortunately now that we have only a "precision buying service" instead of Deja News, things like that aren't easy to find. Luckily we still have Tom Keith's r.g.b. archive -- one relevant article is: http://www.bkgm.com/rgb/rgb.cgi?view+416 which does seem to indicate that a FIBS rating of 1800 has been reasonably consistent at marking the 95th percentile in 1995, 1997 and 2000. One other snippet -- Michael Klein's latest FIBS Ratings Report shows the mean FIBS rating to be 1534, which is surprisingly close to the median. Thanks for those data! (I believe that the "-69.50" figure in the FIBS 25%-ile should be "-98.14".) A few random observations: - The inter-quartile ranges of the online servers do seem to be significantly higher than the national ratings (~210 vs. ~170), which supports the hypothesis that the Internet servers attract a more varied range of players than real-life tournaments. The Danish range is much smaller than the others, though; I have no idea why this would be the case (perhaps the results include a large number of relatively new players? The other descriptions make it sound as if they do or might exclude inexperienced players.) - The Danish, Swedish and British medians show virtually no sign of inflation. I suspect this may be because they "include all rated players": the main cause of inflation is that weak players are more likely to leave the system than strong players, and so weak ratings are gradually deleted over time which effectively raises whatever is left behind. The Norweigian ratings (which require at least 1 match played in the last year) show comparable inflation to FIBS. GamesGrid shows the most inflation of all. This might well be because the financial cost increases the tendency of weak players to leave. I understand that GG have added points to all players' ratings in the past when a server crash lost the results of some games (I'm not sure which is more disturbing -- that somebody thought this was a good idea, or that users were apparently pacified by it!) which would certainly add to this effect. - The results show that the distributions tend to be skewed slightly to the right (the upper quartile is larger than the lower quartile). One explanation for this might be that weak players tend to improve faster than strong players (hopefully nobody's getting significantly worse!) which could shrink the left-hand tail somewhat.

Daniel Murphy writes:

The Danish rating list includes only current, paid-up members. Members who neglect to renew their membership are dropped from the rankings. Ditto for GamesGrid and NBgF and, I assume, for BIBA and SBgF. Not only because seeing one's name in the ratings list is an incentive to remain a member, but because (as is the case in Denmark) membership in the national federation is mandatory for residents to participate in Open or Intermediate flights of almost all tournaments. But several factors do limit inflation in the national ratings. No one can drop out and then rejoin under a different identify. No ever ever gets his rating "re-set" to par. The system never awards all players X points. At least in Denmark, everyone new to the system starts out at par regardless of real or estimated ability. And my impression is that in Denmark, for example, there's a small but steady outflow of higher-ranked players every year, as people move or give up real life play for whatever reason -- I imagine this effect isn't so notable on the online servers. Nis mentions another reason -- unlike all the online systems, the Danish system has no accelerated ratings boost for low-experience players. I believe he's correct that the Norwegian system has adopted the exact FIBS formula, including the "boost" for players with less than 400 TMP.

Did you find the information in this article useful?

Do you have any comments you'd like to add?

Ratings

Constructing a ratings system (Matti Rinta-Nikkola, Dec 1998)

Converting to points-per-game (David Montgomery, Aug 1998) [Recommended reading]

Cube error rates (Joe Russell+, July 2009) [Long message]

Different length matches (Jim Williams+, Oct 1998)

Different length matches (Tom Keith, May 1998) [Recommended reading]

ELO system (seeker, Nov 1995)

Effect of droppers on ratings (Gary Wong+, Feb 1998)

Emperical analysis (Gary Wong, Oct 1998)

Error rates (David Levy, July 2009)

Experience required for accurate rating (Jon Brown+, Nov 2002)

FIBS rating distribution (Gary Wong, Nov 2000)

FIBS rating formula (Patti Beadles, Dec 2003)

FIBS vs. GamesGrid ratings (Raccoon+, Mar 2006) [GammOnLine forum]

Fastest way to improve your rating (Backgammon Man+, May 2004)

Field size and ratings spread (Daniel Murphy+, June 2000) [Long message]

Improving the rating system (Matti Rinta-Nikkola, Nov 2000) [Long message]

KG rating list (Daniel Murphy, Feb 2006) [GammOnLine forum]

KG rating list (Tapio Palmroth, Oct 2002)

MSN Zone ratings flaw (Hank Youngerman, May 2004)

No limit to ratings (David desJardins+, Dec 1998)

On different sites (Bob Newell+, Apr 2004)

Opponent's strength (William Hill+, Apr 1998)

Possible adjustments (Christopher Yep+, Oct 1998)

Rating versus error rate (Douglas Zare, July 2006) [GammOnLine forum]

Ratings and rankings (Chuck Bower, Dec 1997) [Long message]

Ratings and rankings (Jim Wallace, Nov 1997)

Ratings on Gamesgrid (Gregg Cattanach, Dec 2001)

Ratings variation (Kevin Bastian+, Feb 1999)

Ratings variation (FLMaster39+, Aug 1997)

Ratings variation (Ed Rybak+, Sept 1994)

Strange behavior with large rating difference (Ron Karr, May 1996)

Table of ratings changes (Patti Beadles, Aug 1994)

Table of win rates (William C. Bitting, Aug 1995)

Unbounded rating theorem (David desJardins+, Dec 1998)

What are rating points? (Lou Poppler, Apr 1995)

Why high ratings for one-point matches? (David Montgomery, Sept 1995)

[GammOnLine forum] From GammOnLine [Long message] Long message [Recommended reading] Recommended reading [Recent addition] Recent addition

Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

Return to: Backgammon Galore : Forum Archive Main Page