|
Hank Youngerman wrote: [Paraphrasing]
> What is the effect of the "2000" in "SQR(n) * (rating difference) / 2000"
> (from the FIBS rating formula)? Is it empirically derived?
2000 is actually a constant (call it c). If c is changed to 2000 * t (with
t > 0), the average rating will remain the same (approx. 1500). Ratings
which differ from the population average will be scaled away from the
average by a factor of t, i.e. new_rating = avg_rating + t * (old_rating
- avg_rating).
e.g. if avg_rating = 1500, c= 400 (i.e. t = 0.2), then
new_rating = 1500 = 0.2 * (old_rating - 1500)
e.g. 2000 under the old ratings will correspond to 1600 under the new
ratings. 1000 under the old ratings will correspond to 1400 under the
new ratings. Old ratings between 1000 and 2000 have an equivalent new
rating between 1400 and 1600, which can be obtained by linear
interpolation.
Thus, the choice of c = 2000 only affects the spread of the ratings, while
not changing the ordering of "true" ratings. i.e. consider two players
(player1 and player2): if player1_true_rating > player2_true_rating for c =
2000 then player1_true_rating > player2_true_rating for any other c > 0
(and vice-versa).
Note that the above discussion is referring to one's "true" rating. For
small values of c there is a large amount of noise in the ratings system
(since the adjustment factor , 4 * K * SQRT (n) * P, is independent of c),
i.e. ratings will move relatively more quickly with a small c than with a
large c. E.g. in an extreme case, if c = 1, then the "true" ratings would
likely range from 1499.75 to 1500.25, yet a 9-pt. match between two players
of equal rating would boost the winner's rating by 6 pts. (assume K = 1)
and reduce the loser's rating by 6 pts. Obviously in this case, the rating
system would be too unreliable for use. Many players would be massively
(in a relative sense) over- or under-rated.
I assume that 2000 was chosen so that there would be a reasonable spread
between the high and low ratings.
[Note: below I will define "ELO-style rating formula" to be a rating
formula similar to the FIBS one in which P_upset = 1 / (10^ (D * sqrt(n)/c)
+ 1), for c > 0.)
Assuming that an ELO-style rating formula is appropriate (although there is
a lot of evidence to the contrary), c should be chosen so that the
match-adjustment [4 * K * SQRT (n) * P] moves/changes ratings at a
relatively slow (but not too slow rate). If c is chosen too low, then
ratings will move too fast, i.e. they will be too volatile and thus
unreliable. If c is chosen too high then the rating system will take a
very long time to correct the ratings of those who are significantly under-
or over-rated.
Personally, I think that the FIBS ratings system is a bit too volatile. I
would like to see c = 4000. Better yet, to avoid having to scale
everyone's mean-adjusted rating by 2 overnight (and thus alarming many new
users), we can equivalently just change the match-adjustment factor to [2 *
K * SQRT (n) * P].
As noted by some of the empirical evidence referenced in Gary Wong's recent
post, an ELO-style rating formula is not robust over the possible match
lengths. Perhaps a better solution (requiring more housekeeping) would be
to have separate rating formulas for different match lengths. Perhaps
there could be five different ratings: one for 1-pt. matches, one for 2-pt.
matches, one for 3-6 pt. matches, one for 7-16 pt. matches, and one for 17+
pt matches. Having a separate category for 2-pt. matches may be a little
controversial since among expert players a 2-pt. match is virtually
identical to a 1-pt. match, however among novices, there is still room for
cube strategy. :-)
Even better, the value of t (the scaling factor - see the first line of my
post) can be empirically set (different for each of the 5 rating groups) so
that the spread (high rating - low rating) in each of the 5 ratings is
approximately the same. Perhaps one could even be assigned an "overall
rating" which would be the average of each of the 5 ratings (or maybe with
only 50% weighting on the 1-pt. and 2-pt. ratings, i.e. overall_rating =
.125 r(1) + .125 r(2) + .25 r(3-6) + .25 r(7-16) + .25 r(17+)). This would
mean that a player has to be good at both small-length matches as well as
long-length matches in order to have a good overall rating.
Under most backgammon rating systems that I've seen, a player who plays
perfectly in 1-pt. matches (and who plays only 1-pt. matches) can obtain an
extremely high rating, even if he is awful in cube strategy (i.e. since he
will never have to make a cube decision). My proposal would remedy this
problem.
Just my $0.02
Chris
|