Backgammon Rollouts

Rollouts

Standard error and JSD

From:   Stick
Address:   checkmugged@yahoo.com
Date:   15 October 2007
Subject:   Explaining (J)SD w/GNU & rollouts
Forum:   BGonline.org Forums

Pretend I know little to nothing about statistical significance (Chuck can vouch for that), standard deviations, joint standard deviations, etc ... can you explain it using the above example/rollout? [or know of a link where it has already been done, I can't find one with what I'm looking for] 13 14 15 16 17 18 19 20 21 22 23 24 +---+---+---+---+---+---+---+---+---+---+---+---+---+ | X O | | O O X X | | X O | | O O | | X | | O | | X | | O | | | | | +---+ | | | | | 1 | | | | | +---+ | O | | X | | O | | X | | O X | | X | | O X | | X O | | O X X | | X O | +---+---+---+---+---+---+---+---+---+---+---+---+---+ 12 11 10 9 8 7 6 5 4 3 2 1 X rolls 1-1. Rollout: Play Equity Diff. Std.Err. -------------------- ------- ------- -------- 1. 8/7(2), 6/5(2) +0.0773 0.0043 2. 23/22, 11/10, 6/5(2) +0.0615 -0.0158 0.0042

Brice writes:

So no one has touched this yet, guess I'll give it a shot. It's been about 5 years since my last stats course, so others should correct me if I say something stupid. The cubeful equity is listed as .0773, with a standard error (SE) of .0043. The actual position after 8/7(2), 6/5(2) has some exact equity A, but we don't know what it is, so we run trials to try to estimate it. The best guess after these trials is .0773. The standard error is a description of how good our measurement is and lets us construct confidence intervals to tell us where the actual value A might be. 1 SE around our estimate corresponds to about 68%, 2 SE corresponds to 95%*, and 3 SE is about 99.7%. So for this first position we might say There is a 68% chance that A lies in the interval (.0773-1*.0043, .0773+1*.0043) = (.0730, .0816) There is a 95% chance that A lies in the interval (.0773-2*.0043, .0773+2*.0043) = (.0687, .0859) There is a 99.7% chance that A lies in the interval (.0773-3*.0043, .0773+3*.0043) = (.0644, .0902) More trials = smaller SE = smaller intervals = better idea what the actual value is. One could make similar intervals for the second position. When you're comparing two positions, what you actually want is the difference between the two equities: we'll call this A - B. Our best guess for this difference is, not surprisingly, the difference between the guesses for A and B: .0158. It turns out you can't just add the two errors of A and B to get the error for (A - B): it's given by SE(A - B) = squareroot(SE(A)^2 + SE(B)^2)**. (This is the joint standard error.) In our case this comes out to about .00601. Suppose we were to now make a 95% confidence interval for A - B: it would be (.0158 - 2*.0060, .0158 + 2*.0060) = (.0038, .0278) The entire interval is greater than 0; that means we can be 95% sure the actual difference between the equities is positive. Actually, you can even say that we are 97.5% sure by symmetry (there is a 2.5% chance it's below .0038, and a 2.5% chance it's higher than .0278--in the latter case it's still positive)***. Instead of creating intervals, it's usually easier to go in the reverse direction: you have your estimate of A - B (.0158), you have SE(A - B) = .00601, so you can see how many (joint) standard errors you are away from 0. In this case we're at .0158/.00601 = 2.63 standard errors; you can convert this to a probability by doing an integral numerically**** to discover that there's a 99.6% chance that A is better than B, and a .4% chance that B is better than A. Reasonably good estimates to remember: 0 JSEs: 50% chance that top position is better (in other words: TCTC) 1 JSE: 84.1% chance that top position is better 2 JSEs: 97.5% chance that top position is better 3 JSEs: 99.9% chance that top position is better Of course, it is up to you what percentage you think is significant. You'd probably get eyed suspiciously claiming something is true with only 1 JSE, but with 3+ JSEs few would doubt the veracity of your claim. I hope that helped? --Bryce * 95% is actually more like 1.96 standard errors. ** This is because, while standard errors do not add, the variances (which are the squares of the standard deviation) do: Var(A-B) = Var(A+B) = Var(A) + Var(B). *** I might be confusing one- and two-tailed tests here, but it looks right to me. **** Or just use this applet: http://psych.colorado.edu/~mcclella/java/normal/normz.html and enter "2.63" or "-2.63" into the z-score box.

Did you find the information in this article useful?

Do you have any comments you'd like to add?