Ratings

Forum Archive : Ratings

 Experience required for accurate rating

 From: Jon Brown Address: jonbrown@hotmail.com Date: 14 November 2002 Subject: statistics Forum: rec.games.backgammon Google: NHNA9.117888\$c51.35150418@twister.nyroc.rr.com

```I have a few questions related to sample size in statistics:

1. About how many experience points are necessary to get an accurate
measure of your rating ((how does the answer change if you are only playing
the same computer opponent, with a fixed rating  who performs at the same
level (ex. playing offline against snowie) as opposed to playing opponents
of varying levels and ratings (ex. playing on gamesgrid)).

A related question:

2. About how many matches are needed to determine with a reasonable level
of statistical certainty the stronger of two players who are only playing
each other in 1pt or only 7pt. or only 25 pt matches?
```

 Peter Schneider  writes: ```Well... what do you mean with accurate? I'd guess that I'm more than 50% of the time more than 20 rating points away from the rating which corresponds to my average skill. (Yes, also upwards! ;-) ) The interval spanned within the last months was 1740-1860. Kit Woolsey is told to have been seen below 1800 on FIBS once. So I think for me, after having played many thousand games, I have a 95% probability to be within a range of 1760-1820. The center of the oscillations gives a better number, but it takes so many games to determine, that I hopefully have improved inbetween. (But true, this could be detected by sophisticated statistical methods.) I'd say that a rating snapshot at best gives an indication of the playing class (like intermediate or advanced), with a significant probability to be off by 1 class. Last not least, imho the online rating does not say much about the play over the board. ```

 Douglas Zare  writes: ```I agree with Peter Schneider's comments, but I'll reply directly instead of quoting them. You can't get a very accurate estimate from your current rating. If everyone else is correctly rated and the FIBS formula is accurate, then in the long run, your rating will follow a stable distribution with standard deviation roughly 42. The tails are slightly thicker than for a normal distribution, as they drop off roughly exponentially rather than as exp(c x^2). Now that I think about it again, maybe the drop off is more like exp(c x lnx). There are better indicators than the current rating. For large experience levels (over perhaps 5000), the maximum rating varies less than the current rating. This assumes that one's opponents and behavior do not change when one is overrated, which seem unlikely. You can find discussions of these and other issues related to the rating system in the article by me and Adam Stocks, "Ratings - A Mathematical Survey" in GammonVillage.com, which requires a subscription. You can also find some information in the rec.games.backgammon archive http://www.bkgm.com/rgb/rgb.cgi?menu . If your opponents online are often misrated, this widens the stable distribution, and affects the maximum rating. This can seriously affect the distribution if there are, for example, overrated players who only play you when they see you are overrated. You need a lot of extra assumptions to build that into a mathematical model. > A related question: > > 2. About how many matches are needed to determine with a reasonable level > of statistical certainty the stronger of two players who are only playing > each other in 1pt or only 7pt. or only 25 pt matches? It depends. If one player wins 98% of the matches, very few matches will be needed to detect that the stronger player is really stronger. It would take many matches to determine whether the right value is 98% rather than 97% or 99%, but few to realize that it isn't 50%. If there is a 51-49 advantage, it will take many matches. If you are trying to distinguish nearly equal players who differ by an advantage of x% (1% corresponding to a 51-49 edge), then after about 7000/x^2 matches the stronger player has a 95% chance of being ahead. If someone is a 60-40 favorite, then about 70 matches are enough to make it a significant surprise for the weaker player to be ahead. A 51-49 edge would take about 7000 matches to reach the same level of confidence. On the other hand, the score is not the most efficient way to determine who is stronger. There are unbiased methods of variance reduction (by obviously fair side bets on each roll) that can decrease the number of matches needed by at least a factor of 10. See my article "Hedging Toward Skill" in GammonVillage.com . A version of this is implemented by gnu, though it takes some work to extract the unbiased skill estimates. If the FIBS formula is correct (in estimating the relative advantage in 25 point matches from the advantage in 7 point matches), then the total experience level needed is roughly the same for 7 point and for 25 point matches. Douglas Zare ```

### Ratings

Constructing a ratings system  (Matti Rinta-Nikkola, Dec 1998)
Converting to points-per-game  (David Montgomery, Aug 1998)
Cube error rates  (Joe Russell+, July 2009)
Different length matches  (Jim Williams+, Oct 1998)
Different length matches  (Tom Keith, May 1998)
ELO system  (seeker, Nov 1995)
Effect of droppers on ratings  (Gary Wong+, Feb 1998)
Emperical analysis  (Gary Wong, Oct 1998)
Error rates  (David Levy, July 2009)
Experience required for accurate rating  (Jon Brown+, Nov 2002)
FIBS rating distribution  (Gary Wong, Nov 2000)
FIBS rating formula  (Patti Beadles, Dec 2003)
FIBS vs. GamesGrid ratings  (Raccoon+, Mar 2006)
Fastest way to improve your rating  (Backgammon Man+, May 2004)
Field size and ratings spread  (Daniel Murphy+, June 2000)
Improving the rating system  (Matti Rinta-Nikkola, Nov 2000)
KG rating list  (Daniel Murphy, Feb 2006)
KG rating list  (Tapio Palmroth, Oct 2002)
MSN Zone ratings flaw  (Hank Youngerman, May 2004)
No limit to ratings  (David desJardins+, Dec 1998)
On different sites  (Bob Newell+, Apr 2004)
Opponent's strength  (William Hill+, Apr 1998)
Possible adjustments  (Christopher Yep+, Oct 1998)
Rating versus error rate  (Douglas Zare, July 2006)
Ratings and rankings  (Chuck Bower, Dec 1997)
Ratings and rankings  (Jim Wallace, Nov 1997)
Ratings on Gamesgrid  (Gregg Cattanach, Dec 2001)
Ratings variation  (Kevin Bastian+, Feb 1999)
Ratings variation  (FLMaster39+, Aug 1997)
Ratings variation  (Ed Rybak+, Sept 1994)
Strange behavior with large rating difference  (Ron Karr, May 1996)
Table of ratings changes  (Patti Beadles, Aug 1994)
Table of win rates  (William C. Bitting, Aug 1995)
Unbounded rating theorem  (David desJardins+, Dec 1998)
What are rating points?  (Lou Poppler, Apr 1995)
Why high ratings for one-point matches?  (David Montgomery, Sept 1995)