Ratings

Forum Archive : Ratings

 Cube error rates

 From: Joe Russell Address: ez2bblue@aol.com Date: 21 July 2009 Subject: Cube error rates Forum: BGonline.org Forums

```If you are like me you get dinged more than anything else for not doubling
when you should. Often I will miss a cube by .05 or so for several shakes
in a row. That has led me to consider the fairness of the grading of these
errors.

Say you had a position that was static for 25 rolls and for each roll you
made a .05 error by not doubling. Your cumulative error would be 1.25, but
you would have only been .05 better off if you had doubled at any time. Now
say in another 25-roll static position it was wrong to double by .05 and
you doubled on the first roll. Now your cumulative error is only .05,
twenty five times less than the other position, but the true cost of the
errors was identical.

I realize in one situation you made one bad decision and in the other you
made 25 bad decisions, but the cost in MWC of the 25 was truly no more than
the cost of the one.
```

 Maik Stiebler  writes: ```In completely static positions, doubling is always either wrong or optional. So if you are dinged 25 times in a row for 0.050, - your bot is bad at evaluating cube errors in static positions or - your position was not quite static, and you were unlucky to get an awful error rating for repeatedly getting into situations that you didn't understand. Can happen, but it averages out in the long term. I think your statement, "... but you would have only been .05 better off if you had doubled at any time," hinges on the position being completely static. If not, I don't see in which sense it is true. ```

 Gregg Cattanach  writes: ```Your analysis of repeating missed doubles implies that each opportuntity you had to double was the same decision. It WAS NOT. Both players had rolled and moved, creating a new position. Other than perhaps when one player is closed out, the position is effectively different and requires a new double/no double decision process. Thus you are making different no-double errors each time, not the same one, and you should be dinged for each one. ```

 Frank Berger  writes: ```That's right, but nevertheless IMHO Joe's point is still true. If I repeatedly miss a .05 double I add upp an unreasonable error rate. IMHO it is totally irrelevant whether the situation is static or due some fairy dust it stays in that area. If I miss 25 times a 0.05 double I lose in error rates more than a point and that is ridicolous. And the early cube is penalized just once. ```

 David Levy  writes: ```Let's assume the missed doubles were not consecutive, but in different games in the same match. Is there are problem about multiple dinging? If they were in different matches? I suspect Joe is thinking, "I am a Snowie ER 3 player, but these multiple dings gave me an error rate of 6 and I don't believe it." The Snowie error rate in any match reflects how well the player understands the positions that came up in the match and the frequency those positions came up. If a poorly-understood position comes up a lot (cumulative missed doubles), the error rate is higher. I have a great error rate in a match consisting only of five-anchor holding games. But I know I'm not any better after seeing that low error rate. Moral: Don't expect too much of the Snowie error rate, particular for a single match. ```

 Matt Reklaitis  writes: ```My take on this situation is that, a bot's evaluation represents values related to its own play. So when calculating the size of any individual error, inherent to that calculation is that your future play is similar to the bots. The more your play differs from the bot's, the more the model breaks down. ```

 Matt Cohn-Geier  writes: ```Let's say your error here in not doubling is .05. 24 23 22 21 20 19 18 17 16 15 14 13 +---+---+---+---+---+---+---+---+---+---+---+---+---+ | O O O O O X | | O O O | | O O O O | | O | | | | | | | | | | | X | | | | | | X on roll | | O | | | | | | | | | | | X X X X | | X | | X X X X X O | | X X X | +---+---+---+---+---+---+---+---+---+---+---+---+---+ 1 2 3 4 5 6 7 8 9 10 11 12 This assumes that after fan/fan you will cube. But after fan/fan you won't cube...so your error is compounded more than .05 would indicate. So regarding the error as just .05 isn't sufficient. If, on the other hand, you doubled a position where the ND error was .05, you would deprive yourself of a chance to make future mistakes. ```

 Maik Stiebler  writes: ```Yes, not doubling in this and all the repeated situations does cost more than not doubling in this situation and reconsidering after fan/fan. The latter cost is what the bot reports, and if it is 0.05, the former cost is approx. 0.062. The difference here is not large, because an exact repeat of the position only happens with a probability of 256/1296. In the typical "static situation", the effect may be much larger. If you reach this position, you will usually get dinged for 0.05 in total, but on a very bad day you can be dinged for something like 0.50. Those occasional high dings are needed for unbiased feedback, because the typical 0.05 ding is too small. The average ding is just right. On the other hand, I can see why it is regarded as unfair that you can be dinged 0.05 or 0.50 for the same error. Douglas Zare discusses this in his GV article "Unbiased Nonsense" and proposes a variance reduction method on the error rate. I don't think that that would be accepted by the bg community, but it is an interesting concept. ```

 Maik Stiebler  writes: ```I had prepared a puzzle involving a discrete random walk to post at some point in this thread: Players A and B play a game of StaticishRace. A game consists of tossing a fair coin and changing the score by +1 or -1 respectively based on the result of the coin toss. The starting score is 0. Player A wins one point and the game ends when the score reaches 50. Player B wins one point and the game ends when the score reaches -50. Before each coin toss, ONLY Player A (to keep things simple) is given the opportunity to double the stakes, after which Player B can either take or drop (ending the game and losing a single point). A double is allowed only once in a game. 1. What is the optimal strategy for both players? 2. What is the theoretical (assuming optimal play from both sides) equity of the starting position? 3. Assume Player A deviates from optimal strategy by doubling if and only if the current score is +30. What is the practical equity of the starting position then? 4. How much does Player A's deviation from perfect play cost him on average per game? Now put yourself in the position of a bot that knows the optimal strategy and observes the game, not knowing Player A's complete strategy, but noting the wrong plays that follow from the strategy. 5. (a) At which points in the game does Player A, following the non-optimal strategy, blunder away theoretical equity by making a wrong play? (b) How much equity does each of these wrong plays lose? 6. How often will the opportunity for Player A to make a wrong (equity losing) play arise in a game? Compute both an average value (a) and a distribution (b). 7. Verify that the average number of blunder opportunities (6a) times the cost of a blunder (7b) equals the total cost of Player A's misguided strategy (4). ```

 Bob Koca  writes: ```> 1. What is the optimal strategy for both players? Player A doubles at +25 at which point B has an optional take/pass. That is because B is then has the required 1/4 winning chance. A does not double before then as there is no market loss. > 2. What is the theoretical (assuming optimal play from both sides) equity > of the starting position? A has +1/3 equity. Going from -50 to +25 is 75 and A starts 2/3 of the way there. 2/3 - 1/3 = 1/3. > 3. Assume Player A deviates from optimal strategy by doubling if and only > if the current score is +30. What is the practical equity of the starting > position then? A wins 50/80 of the games for an equity of +1/4. > 4. How much does Player A's deviation from perfect play cost him on > average per game? 1/3 - 1/4 = 1/12 > Now put yourself in the position of a bot that knows the optimal strategy > and observes the game, not knowing Player A's complete strategy, but > noting the wrong plays that follow from the strategy. > > 5. (a) At which points in the game does Player A, following the > non-optimal strategy, blunder away theoretical equity by making a > wrong play? > (b) How much equity does each of these wrong plays lose? Only when the game is at exactly 25 does A lose equity. Values below +25 are a no-double and values above +25 are an optional double. The cash is not lost in the next sequence. At +25 B's has an optional take/pass. Let's suppose he would pass. Then the equity lost by not doubling at +25 is 1/2 the equity lost by the game reaching 24 (if it reaches +26 it is a cash anyways). At +24, A's theoretical equity is 74/75 - 1/75 = 73/75, a loss of 2/75. So failing to double at +25 costs 1/75 equity. > 6. How often will the opportunity for Player A to make a wrong (equity > losing) play arise in a game? Compute both an average value (a) and a > distribution (b). It happens at least once in 2/3rds of the games. Suppose that the game is exactly +25. We need to find the probability of that state occurring again. This is harder than the previous questions. When you are at +25 you go to +26 half the time. From there you go to 25 before 30 with probability 4/5. The other half the time you go to 24 and then the chance of returning to 25 before -50 is 74/75. (1/2)(4/5)+ (1/2)(74/75) = 67/75 chance of a repeat visit to 25 if start from 25. The expected number of times needed to not repeat has a geometric dist with p = 8/75 and has an expected value of 1/(8/75) = 75/8. The expected total number of visits is thus (2/3)(75/8) = 25/4. The distribution is as follows: Exactly 0 visits occurs with probability 1/3. Exactly 1 visit occurs with probability (2/3)(8/75). Exactly 2 visits occurs with probability (2/3)(67/75)(8/75). Exactly 3 visits occurs with probability (2/3)(67/75)**2(8/75). ... Exactly n visits occurs with probability (2/3)(67/75)**(n-1)(8/75). (This was jointly done with Chris Yep). > 7. Verify that the average number of blunder opportunities (6a) times the > cost of a blunder (7b) equals the total cost of Player A's misguided > strategy (4). (25/4)(1/75) = 1/12. ```

### Ratings

Constructing a ratings system  (Matti Rinta-Nikkola, Dec 1998)
Converting to points-per-game  (David Montgomery, Aug 1998)
Cube error rates  (Joe Russell+, July 2009)
Different length matches  (Jim Williams+, Oct 1998)
Different length matches  (Tom Keith, May 1998)
ELO system  (seeker, Nov 1995)
Effect of droppers on ratings  (Gary Wong+, Feb 1998)
Emperical analysis  (Gary Wong, Oct 1998)
Error rates  (David Levy, July 2009)
Experience required for accurate rating  (Jon Brown+, Nov 2002)
FIBS rating distribution  (Gary Wong, Nov 2000)
FIBS rating formula  (Patti Beadles, Dec 2003)
FIBS vs. GamesGrid ratings  (Raccoon+, Mar 2006)
Fastest way to improve your rating  (Backgammon Man+, May 2004)
Field size and ratings spread  (Daniel Murphy+, June 2000)
Improving the rating system  (Matti Rinta-Nikkola, Nov 2000)
KG rating list  (Daniel Murphy, Feb 2006)
KG rating list  (Tapio Palmroth, Oct 2002)
MSN Zone ratings flaw  (Hank Youngerman, May 2004)
No limit to ratings  (David desJardins+, Dec 1998)
On different sites  (Bob Newell+, Apr 2004)
Opponent's strength  (William Hill+, Apr 1998)
Possible adjustments  (Christopher Yep+, Oct 1998)
Rating versus error rate  (Douglas Zare, July 2006)
Ratings and rankings  (Chuck Bower, Dec 1997)
Ratings and rankings  (Jim Wallace, Nov 1997)
Ratings on Gamesgrid  (Gregg Cattanach, Dec 2001)
Ratings variation  (Kevin Bastian+, Feb 1999)
Ratings variation  (FLMaster39+, Aug 1997)
Ratings variation  (Ed Rybak+, Sept 1994)
Strange behavior with large rating difference  (Ron Karr, May 1996)
Table of ratings changes  (Patti Beadles, Aug 1994)
Table of win rates  (William C. Bitting, Aug 1995)
Unbounded rating theorem  (David desJardins+, Dec 1998)
What are rating points?  (Lou Poppler, Apr 1995)
Why high ratings for one-point matches?  (David Montgomery, Sept 1995)