Forum Archive :
Ratings
Experience required for accurate rating

I have a few questions related to sample size in statistics:
1. About how many experience points are necessary to get an accurate
measure of your rating ((how does the answer change if you are only playing
the same computer opponent, with a fixed rating who performs at the same
level (ex. playing offline against snowie) as opposed to playing opponents
of varying levels and ratings (ex. playing on gamesgrid)).
A related question:
2. About how many matches are needed to determine with a reasonable level
of statistical certainty the stronger of two players who are only playing
each other in 1pt or only 7pt. or only 25 pt matches?


Peter Schneider writes:
Well... what do you mean with accurate? I'd guess that I'm more than 50%
of the time more than 20 rating points away from the rating which
corresponds to my average skill. (Yes, also upwards! ;) ) The interval
spanned within the last months was 17401860. Kit Woolsey is told to have
been seen below 1800 on FIBS once.
So I think for me, after having played many thousand games, I have a 95%
probability to be within a range of 17601820.
The center of the oscillations gives a better number, but it takes so many
games to determine, that I hopefully have improved inbetween. (But true,
this could be detected by sophisticated statistical methods.)
I'd say that a rating snapshot at best gives an indication of the playing
class (like intermediate or advanced), with a significant probability to be
off by 1 class.
Last not least, imho the online rating does not say much about the play
over the board.


Douglas Zare writes:
I agree with Peter Schneider's comments, but I'll reply directly instead of
quoting them.
You can't get a very accurate estimate from your current rating. If
everyone else is correctly rated and the FIBS formula is accurate, then in
the long run, your rating will follow a stable distribution with standard
deviation roughly 42. The tails are slightly thicker than for a normal
distribution, as they drop off roughly exponentially rather than as exp(c
x^2). Now that I think about it again, maybe the drop off is more like
exp(c x lnx).
There are better indicators than the current rating. For large experience
levels (over perhaps 5000), the maximum rating varies less than the current
rating. This assumes that one's opponents and behavior do not change when
one is overrated, which seem unlikely.
You can find discussions of these and other issues related to the rating
system in the article by me and Adam Stocks, "Ratings  A Mathematical
Survey" in GammonVillage.com, which requires a subscription. You can also
find some information in the rec.games.backgammon archive
http://www.bkgm.com/rgb/rgb.cgi?menu .
If your opponents online are often misrated, this widens the stable
distribution, and affects the maximum rating. This can seriously affect the
distribution if there are, for example, overrated players who only play you
when they see you are overrated. You need a lot of extra assumptions to
build that into a mathematical model.
> A related question:
>
> 2. About how many matches are needed to determine with a reasonable level
> of statistical certainty the stronger of two players who are only playing
> each other in 1pt or only 7pt. or only 25 pt matches?
It depends. If one player wins 98% of the matches, very few matches will be
needed to detect that the stronger player is really stronger. It would take
many matches to determine whether the right value is 98% rather than 97% or
99%, but few to realize that it isn't 50%. If there is a 5149 advantage,
it will take many matches.
If you are trying to distinguish nearly equal players who differ by an
advantage of x% (1% corresponding to a 5149 edge), then after about
7000/x^2 matches the stronger player has a 95% chance of being ahead. If
someone is a 6040 favorite, then about 70 matches are enough to make it a
significant surprise for the weaker player to be ahead. A 5149 edge would
take about 7000 matches to reach the same level of confidence.
On the other hand, the score is not the most efficient way to determine who
is stronger. There are unbiased methods of variance reduction (by obviously
fair side bets on each roll) that can decrease the number of matches needed
by at least a factor of 10. See my article "Hedging Toward Skill" in
GammonVillage.com . A version of this is implemented by gnu, though it
takes some work to extract the unbiased skill estimates.
If the FIBS formula is correct (in estimating the relative advantage in 25
point matches from the advantage in 7 point matches), then the total
experience level needed is roughly the same for 7 point and for 25 point
matches.
Douglas Zare




Ratings
 Constructing a ratings system (Matti RintaNikkola, Dec 1998)
 Converting to pointspergame (David Montgomery, Aug 1998)
 Cube error rates (Joe Russell+, July 2009)
 Different length matches (Jim Williams+, Oct 1998)
 Different length matches (Tom Keith, May 1998)
 ELO system (seeker, Nov 1995)
 Effect of droppers on ratings (Gary Wong+, Feb 1998)
 Emperical analysis (Gary Wong, Oct 1998)
 Error rates (David Levy, July 2009)
 Experience required for accurate rating (Jon Brown+, Nov 2002)
 FIBS rating distribution (Gary Wong, Nov 2000)
 FIBS rating formula (Patti Beadles, Dec 2003)
 FIBS vs. GamesGrid ratings (Raccoon+, Mar 2006)
 Fastest way to improve your rating (Backgammon Man+, May 2004)
 Field size and ratings spread (Daniel Murphy+, June 2000)
 Improving the rating system (Matti RintaNikkola, Nov 2000)
 KG rating list (Daniel Murphy, Feb 2006)
 KG rating list (Tapio Palmroth, Oct 2002)
 MSN Zone ratings flaw (Hank Youngerman, May 2004)
 No limit to ratings (David desJardins+, Dec 1998)
 On different sites (Bob Newell+, Apr 2004)
 Opponent's strength (William Hill+, Apr 1998)
 Possible adjustments (Christopher Yep+, Oct 1998)
 Rating versus error rate (Douglas Zare, July 2006)
 Ratings and rankings (Chuck Bower, Dec 1997)
 Ratings and rankings (Jim Wallace, Nov 1997)
 Ratings on Gamesgrid (Gregg Cattanach, Dec 2001)
 Ratings variation (Kevin Bastian+, Feb 1999)
 Ratings variation (FLMaster39+, Aug 1997)
 Ratings variation (Ed Rybak+, Sept 1994)
 Strange behavior with large rating difference (Ron Karr, May 1996)
 Table of ratings changes (Patti Beadles, Aug 1994)
 Table of win rates (William C. Bitting, Aug 1995)
 Unbounded rating theorem (David desJardins+, Dec 1998)
 What are rating points? (Lou Poppler, Apr 1995)
 Why high ratings for onepoint matches? (David Montgomery, Sept 1995)
From GammOnLine
Long message
Recommended reading
Recent addition

 
