Ratings

Forum Archive : Ratings

 
Experience required for accurate rating

From:   Jon Brown
Address:   jonbrown@hotmail.com
Date:   14 November 2002
Subject:   statistics
Forum:   rec.games.backgammon
Google:   NHNA9.117888$c51.35150418@twister.nyroc.rr.com

I have a few questions related to sample size in statistics:

1. About how many experience points are necessary to get an accurate
measure of your rating ((how does the answer change if you are only playing
the same computer opponent, with a fixed rating  who performs at the same
level (ex. playing offline against snowie) as opposed to playing opponents
of varying levels and ratings (ex. playing on gamesgrid)).

A related question:

2. About how many matches are needed to determine with a reasonable level
of statistical certainty the stronger of two players who are only playing
each other in 1pt or only 7pt. or only 25 pt matches?

Peter Schneider  writes:

Well... what do you mean with accurate? I'd guess that I'm more than  50%
of the time more than 20 rating points away from the rating which
corresponds to my average skill. (Yes, also upwards! ;-) ) The interval
spanned within the last months was 1740-1860. Kit Woolsey is told to have
been seen below 1800 on FIBS once.

So I think for me, after having played many thousand games, I have a 95%
probability to be within a range of 1760-1820.

The center of the oscillations gives a better number, but it takes so many
games to determine, that I hopefully have improved inbetween. (But true,
this could be detected by sophisticated statistical methods.)

I'd say that a rating snapshot at best gives an indication of the playing
class (like intermediate or advanced), with a significant probability to be
off by 1 class.

Last not least, imho the online rating does not say much about the play
over the board.

Douglas Zare  writes:

I agree with Peter Schneider's comments, but I'll reply directly instead of
quoting them.

You can't get a very accurate estimate from your current rating. If
everyone else is correctly rated and the FIBS formula is accurate, then in
the long run, your rating will follow a stable distribution with standard
deviation roughly 42. The tails are slightly thicker than for a normal
distribution, as they drop off roughly exponentially rather than as exp(c
x^2). Now that I think about it again, maybe the drop off is more like
exp(c x lnx).

There are better indicators than the current rating. For large experience
levels (over perhaps 5000), the maximum rating varies less than the current
rating. This assumes that one's opponents and behavior do not change when
one is overrated, which seem unlikely.

You can find discussions of these and other issues related to the rating
system in the article by me and Adam Stocks, "Ratings - A Mathematical
Survey" in GammonVillage.com, which requires a subscription. You can also
find some information in the rec.games.backgammon archive
http://www.bkgm.com/rgb/rgb.cgi?menu .

If your opponents online are often misrated, this widens the stable
distribution, and affects the maximum rating. This can seriously affect the
distribution if there are, for example, overrated players who only play you
when they see you are overrated. You need a lot of extra assumptions to
build that into a mathematical model.

> A related question:
>
> 2. About how many matches are needed to determine with a reasonable level
> of statistical certainty the stronger of two players who are only playing
> each other in 1pt or only 7pt. or only 25 pt matches?

It depends. If one player wins 98% of the matches, very few matches will be
needed to detect that the stronger player is really stronger. It would take
many matches to determine whether the right value is 98% rather than 97% or
99%, but few to realize that it isn't 50%. If there is a 51-49 advantage,
it will take many matches.

If you are trying to distinguish nearly equal players who differ by an
advantage of x% (1% corresponding to a 51-49 edge), then after about
7000/x^2 matches the stronger player has a 95% chance of being ahead. If
someone is a 60-40 favorite, then about 70 matches are enough to make it a
significant surprise for the weaker player to be ahead. A 51-49 edge would
take about 7000 matches to reach the same level of confidence.

On the other hand, the score is not the most efficient way to determine who
is stronger. There are unbiased methods of variance reduction (by obviously
fair side bets on each roll) that can decrease the number of matches needed
by at least a factor of 10. See my article "Hedging Toward Skill" in
GammonVillage.com . A version of this is implemented by gnu, though it
takes some work to extract the unbiased skill estimates.

If the FIBS formula is correct (in estimating the relative advantage in 25
point matches from the advantage in 7 point matches), then the total
experience level needed is roughly the same for 7 point and for 25 point
matches.

Douglas Zare
 
Did you find the information in this article useful?          

Do you have any comments you'd like to add?     

 

Ratings

Constructing a ratings system  (Matti Rinta-Nikkola, Dec 1998) 
Converting to points-per-game  (David Montgomery, Aug 1998)  [Recommended reading]
Cube error rates  (Joe Russell+, July 2009)  [Long message]
Different length matches  (Jim Williams+, Oct 1998) 
Different length matches  (Tom Keith, May 1998)  [Recommended reading]
ELO system  (seeker, Nov 1995) 
Effect of droppers on ratings  (Gary Wong+, Feb 1998) 
Emperical analysis  (Gary Wong, Oct 1998) 
Error rates  (David Levy, July 2009) 
Experience required for accurate rating  (Jon Brown+, Nov 2002) 
FIBS rating distribution  (Gary Wong, Nov 2000) 
FIBS rating formula  (Patti Beadles, Dec 2003) 
FIBS vs. GamesGrid ratings  (Raccoon+, Mar 2006)  [GammOnLine forum]
Fastest way to improve your rating  (Backgammon Man+, May 2004) 
Field size and ratings spread  (Daniel Murphy+, June 2000)  [Long message]
Improving the rating system  (Matti Rinta-Nikkola, Nov 2000)  [Long message]
KG rating list  (Daniel Murphy, Feb 2006)  [GammOnLine forum]
KG rating list  (Tapio Palmroth, Oct 2002) 
MSN Zone ratings flaw  (Hank Youngerman, May 2004) 
No limit to ratings  (David desJardins+, Dec 1998) 
On different sites  (Bob Newell+, Apr 2004) 
Opponent's strength  (William Hill+, Apr 1998) 
Possible adjustments  (Christopher Yep+, Oct 1998) 
Rating versus error rate  (Douglas Zare, July 2006)  [GammOnLine forum]
Ratings and rankings  (Chuck Bower, Dec 1997)  [Long message]
Ratings and rankings  (Jim Wallace, Nov 1997) 
Ratings on Gamesgrid  (Gregg Cattanach, Dec 2001) 
Ratings variation  (Kevin Bastian+, Feb 1999) 
Ratings variation  (FLMaster39+, Aug 1997) 
Ratings variation  (Ed Rybak+, Sept 1994) 
Strange behavior with large rating difference  (Ron Karr, May 1996) 
Table of ratings changes  (Patti Beadles, Aug 1994) 
Table of win rates  (William C. Bitting, Aug 1995) 
Unbounded rating theorem  (David desJardins+, Dec 1998) 
What are rating points?  (Lou Poppler, Apr 1995) 
Why high ratings for one-point matches?  (David Montgomery, Sept 1995) 

[GammOnLine forum]  From GammOnLine       [Long message]  Long message       [Recommended reading]  Recommended reading       [Recent addition]  Recent addition
 

  Book Suggestions
Books
Cheating
Chouettes
Computer Dice
Cube Handling
Cube Handling in Races
Equipment
Etiquette
Extreme Gammon
Fun and frustration
GNU Backgammon
History
Jellyfish
Learning
Luck versus Skill
Magazines & E-zines
Match Archives
Match Equities
Match Play
Match Play at 2-away/2-away
Miscellaneous
Opening Rolls
Pip Counting
Play Sites
Probability and Statistics
Programming
Propositions
Puzzles
Ratings
Rollouts
Rules
Rulings
Snowie
Software
Source Code
Strategy--Backgames
Strategy--Bearing Off
Strategy--Checker play
Terminology
Theory
Tournaments
Uncategorized
Variations

 

Return to:  Backgammon Galore : Forum Archive Main Page