Stopping a rollout early based on JSD

How Does Stopping a Rollout Early Affect Results?

Performing a rollout can be time-consuming and it is often obvious early on which play is best. Gnu Backgammon has a feature where you can stop a rollout when the difference between two plays (measured in JSD) exceeds a certain value. But this JSD-cutoff feature has to be used with some care because the standard way of calculating the confidence assumes that the rollout is run to conclusion.

What we need to know is what affect using a JSD-cutoff has on the error rate of the rollout. So I wrote a simulation to find out.

Simulating a Rollout

A trial is simulated as a normally distributed random number about a given mean. (The mean is the value the rollout converges to.) A rollout, then, is a series of trials with their results averaged together. In the simulation, two plays are rolled out with the same standard deviation and slightly different means. The rollouts are run in parallel and the rollout is stopped when the estimated JSD reaches a certain value. At that point, or when the maximum number of trials is reached, it is noted whether the rollout is right or wrong. (A rollout is wrong if the play with the higher mean gets a lower result in the rollout.)

The simulation tabulates the number of times the rollout was right versus wrong and calculates an error rate for each of various configuations. I varied two parameters: (1) the distance between the plays, and (2) the JSD-cutoff point. The result is a table with "distance between plays" varying by row and "JSD cutoff" varying by column.

[FYI: When reading the tables below, the numbers in the row labels are the distance between the plays measured in terms of the JSD you'd expect to see if the rollout were run to completion.]

First Simulation: Max 1296 Trials, Min 324 Trials (ratio 4 to 1)

The output of the simulation is here. Here are some observations from this data:

If your maximum tolerable error rate is 0.05, then you can resolve plays down to a separation of:
- 1.65, if you use a cutoff of 3.0 jsd, which is as good as no cutoff at all
- 1.66, if you use a cutoff of 2.5 jsd
- 1.68, if you use a cutoff of 2.0 jsd
If your maximum tolerable error rate is 0.02, then you can resolve plays down to a separation of:
- 2.06, if you use a cutoff of 3.0 jsd, which is as good as no cutoff at all
- 2.07, if you use a cutoff of 2.5 jsd
- 2.10, if you use a cutoff of 2.0 jsd

So a cutoff of 3.0 jsd is just as good as no cutoff, and a cutoff of 2.5 jsd is almost as good.

Second Simulation: Max 1296 Trials, Min 108 Trials (ratio 12 to 1)

The output of the simulation is here. Here are some observations from this data:

If your maximum tolerable error rate is 0.05, then you can resolve plays down to a separation of:
- 1.65, if you use a cutoff of 3.5 jsd, which is as good as no cutoff at all
- 1.66, if you use a cutoff of 3.0 jsd
- 1.68, if you use a cutoff of 2.5 jsd
- 1.82, if you use a cutoff of 2.0 jsd
If your maximum tolerable error rate is 0.02, then you can resolve plays down to a separation of:
- 2.06, if you use a cutoff of 3.5 jsd, which is as good as no cutoff at all
- 2.07, if you use a cutoff of 3.0 jsd
- 2.13, if you use a cutoff of 2.5 jsd
- 2.39, if you use a cutoff of 2.0 jsd

So a cutoff of 3.5 jsd is just as good as no cutoff, and a cutoff of 3.0 jsd is almost as good.

Third Simulation: Max 1296 Trials, Min 36 Trials (ratio 36 to 1)

The output of the simulation is here. Here are some observations from this data:

If your maximum tolerable error rate is 0.05, then you can resolve plays down to a separation of:
- 1.65, if you use a cutoff of 4.0 jsd, which is as good as no cutoff at all
- 1.66, if you use a cutoff of 3.5 jsd
- 1.68, if you use a cutoff of 3.0 jsd
- 1.78, if you use a cutoff of 2.5 jsd
- 2.22, if you use a cutoff of 2.0 jsd
If your maximum tolerable error rate is 0.02, then you can resolve plays down to a separation of:
- 2.06, if you use a cutoff of 4.0 jsd, which is as good as no cutoff at all
- 2.07, if you use a cutoff of 3.5 jsd
- 2.12, if you use a cutoff of 3.0 jsd
- 2.33, if you use a cutoff of 2.5 jsd
- 2.81, if you use a cutoff of 2.0 jsd

So a cutoff of 4.0 jsd is just as good as no cutoff, and a cutoff of 3.5 jsd is almost as good.

Fourth Simulation: Max 5184 Trials, Min 144 Trials (ratio 36 to 1)

The output of the simulation is here. This simulation uses the same ratio of max-trials to min-trials as the previous simulation. I wanted to see what affect raising the number trials had without changing the ratio. Here are some observations:

If your maximum tolerable error rate is 0.05, then you can resolve plays down to a separation of:
- 1.65, if you use a cutoff of 4.0 jsd, which is as good as no cutoff at all
- 1.66, if you use a cutoff of 3.5 jsd
- 1.68, if you use a cutoff of 3.0 jsd
- 1.78, if you use a cutoff of 2.5 jsd
- 2.25, if you use a cutoff of 2.0 jsd
If your maximum tolerable error rate is 0.02, then you can resolve plays down to a separation of:
- 2.06, if you use a cutoff of 4.0 jsd, which is as good as no cutoff at all
- 2.07, if you use a cutoff of 3.5 jsd
- 2.11, if you use a cutoff of 3.0 jsd
- 2.32, if you use a cutoff of 2.5 jsd

These results are very close to Simulation 3. This suggests that the absolute number of trials does not make much difference. Rather, it is the ratio of the maximum to minimum number of trials that matters most when deciding what threshold to use.

Recommendations

Based on these simulations, here are some recommendations for using the JSD-cutoff feature in rollouts:

If the ratio between the maximum number of trials and the minimum number of trials is 4-to-1, you should use a cutoff of 3.0 JSD (best) or 2.5 JSD (almost as good).
If the ratio between the maximum number of trials and the minimum number of trials is 12-to-1, you should use a cutoff of 3.5 JSD (best) or 3.0 JSD (almost as good).
If the ratio between the maximum number of trials and the minimum number of trials is 36-to-1, you should use a cutoff of 4.0 JSD (best) or 3.5 JSD (almost as good).

Tom Keith, May 2011