Backgammon Rollouts

Rollouts

How reliable are rollouts?

From:   David Montgomery
Address:   monty@cs.umd.edu
Date:   2 August 1999
Subject:   Re: How Reliable are Rollouts? [Was: Difficult Crawford checker play]
Forum:   rec.games.backgammon
Google:   7o4esc$b2m$1@krackle.cs.umd.edu

> What's the basis for having any confidence in a rollout? The basis is: if we do a rollout long enough, and we play in the rollout just the way we would play over the board, then eventually the rollout results will converge to be arbitrarily close to the expected values. (I'm going to ignore positions that diverge.) It's just statistics. A computer rollout is actually a completely reliable estimate of a position's equity -- assuming that both sides are played by that same computer program. You can also look at the rollout at trying to simulate "perfect" play, rather than "actual" play. This doesn't change much. The rollouts are only a rough simulation of either one. Doing a rollout long enough isn't hard anymore, because of the bots. However, we can never have complete assurance that a rollout is played the way that we would play over the board. Most players' play selections are somewhat random, and in any event each player is unique. There is no general theoretical assurance that rollout results will closely approximate results for any two human players. In fact, there are positions where the rollout results are known to be wildly different. There is actually very little empirical evidence to support the idea that computer rollouts are generally close to the expected results for human players. This is because it takes too long to gather any decent data with human play. The best evidence is probably from some Jellyfish interactive rollouts, which use variance reduction to squeeze more information out of manual rollouts. But I don't know of any work where someone has tried to show that the bot rollouts actually reflect the results humans would get. Despite this lack of solid evidence, there are very good reasons for trusting computer rollouts most of the time. Most importantly, we know that the computer programs play very well. Because they play well, we expect their results to closely reflect the results between two strong human players, most of the time. A second major reason is due to the nature of backgammon itself. Most backgammon positions quickly engender a large number of variations. After the first few rolls, there are a wide variety of different kinds of problems. A computer program's lack of complete understanding of a certain kind of problem that arises occasionally in a rollout won't necessarily destroy the validity of the rollout, because that problem is probably only a small fraction of the decisions that must be made. And although the program makes some mistakes on these problems, these are likely to be offset to some degree by mistakes made when playing for the opposing side. > If you don't know which is the better strategy/tactic/move in a > particular position, and you question Snowie's evaluation at 3-ply, you > do a rollout. Snowie proceeds to play out the position numerous times. > However, why should only the specific position that you're investigating > pose a difficult question? On the very next roll, the very same, or a > similar, strategical or tactical issue may be presented again. This is true, and with these kinds of positions you should think more carefully about whether you trust the rollout results. If the same thematic idea gets tested over and over again, then if the bot doesn't understand it, the rollout is likely to be worthless. > Or maybe an entirely different, but still difficult decision will have > to be made. This is less problematic, because once we have a variety of decisions, difficult or otherwise, it is less likely that the bot will botch them all. The fact that there are difficult decisions, even (especially?) for the bot, means that errors will be made in the rollout. If the errors are small, it isn't of great concern unless many of them accumulate. Small errors in the play will make only a small difference in the rollout result. If the errors are large, that can be a problem. Many positions are of a nature that big errors are rare, simply because most plays are very close in equity. For example, bearing in and off against contact from the bar. Other positions are of a nature that although big errors are not so rare, they occur for both sides. In this case, they offset each other somewhat, and the overall effect on the rollout is not so severe. The real problem is when big errors are not rare, and they occur predominantly for one side. And in this case the rollouts won't be reliable. The most diagnosable situation like this is when one side often makes a big error on its first turn. > If you're not sure what the "best" move is to start out with, and you > don't know whether Snowie is making the best decisions in subsequent > positions, what's the basis for your confidence in the rollout? The hope is that the bot is making almost all "good" moves, where "good" may not necessarily be "best." > In fact, in positions that involve anything more than racing, how do > we *ever* have confidence that a rollout yields the "correct" play? One important idea is that the bots are less likely to completely obscure the big errors. Let's take your example. Say it is a terrible mistake to run off the anchor, and yet the bot likes it. Now, if you roll out running and not-running, in the not-running variation the bot is likely to run on its next turn, obscuring the difference between the two thematic approaches. However, if running is a big enough error, then there will still tend to be some difference due to the first play. And in general, the bigger the error the bigger the difference that will show up, other things being equal. For most positions you can have a lot of faith in a rollout that produces a large difference. Rollouts that generate a small difference are much less reliable, but also less important. > I don't think it's any answer that we can have confidence in the rollout > because Snowie has proven over time that it's a good BG player. The > same argument can be used to justify Snowie's decisions at 3-ply. Yet > when we question a 3-ply decision by Snowie, we do a rollout on Snowie. > It seems rather circular. The difference is, if you do a long rollout, then you see what the equity *is* (to within some statistical uncertainty) *assuming that the bot plays the position*. With an evaluation, you just have the opinion of a very strong player. With a rollout you have the results of thousands of actual games. The rollout is the answer to the question: "What is the equity in this position if the bot plays both sides?" The question you then have to ask yourself is whether the answer to this question is close enough to the answer to your real question, which is probably something like: "What is my equity in this position against the people I tend to play against?" For strong players playing other strong players, these questions will usually have similar answers, so the expert can often rely on the rollout. However, experts generally look at rollout results with a somewhat critical eye, and if the results don't seem right, then they will consider reasons why the rollout might be wrong. (They will also consider reasons why their own understanding of the position might be wrong.) > Frankly, I have the very same question about rollouts that are done by > humans. If an expert is not sure of the correct strategy in a > particular position, how can he do an effective rollout if subsequent > positions keep presenting similar strategic decisions? All the same problems occur with human rollouts. Humans have the advantage that they can learn. They have several disadvantages, too. The biggest is that they are too slow. -- David Montgomery Beltway Backgammon Club davidmontgomery@netzero.net Washington DC area BG Tournaments monty on FIBS and GG www.cs.umd.edu/~monty/bbc.htm

Did you find the information in this article useful?

Do you have any comments you'd like to add?