From:   Nigel Gibbions
Date:   7 March 1998
Subject:   JF and New Ideas in BG

The following may be of interest to owners of Woolsey's book, "New Ideas in

Having recently uprgaded my copy of Jellyfish from version 2.01 to version
3.0, I decided see how many problems in the book each version got right
playing at levels 5, 6 and 7. The idea was to get some sort of a handle on
the following issues:

(1) The relative merits of the two versions of Jellyfish.

(2) A comparison of JF's standard of checker play with my own.

(3) A feel for the relative playing strengths of different levels within
the same version of JF.

Here are the results:

                         JF2.01                          JF3.0
                Wrong   Equity  Average         Wrong   Equity  Average
Level 5         40      3.058   0.029           41      3.006   0.029
Level 6         24      1.724   0.017           23      1.622   0.016
Level 7         18      1.388   0.013            8      0.508   0.005

The heading "Wrong" refers to the number of problems (out of a total of
104) each program got wrong (taking the solutions in the book, which are
based upon extensive rollouts, and expert opinion, as given). "Equity"
refers to the total equity sacrificed as a result of these errors. And
"Average" refers to the average equity sacrificed per problem (i.e. total
equity sacrificed, divided by 104).

Some general conclusions from the above results:

(1) The improvement between levels, within the same version of JF is quite
significant. In both versions, level 6 gets many problems right that level
5 gets wrong, and the average equity sacrificed over this series of 104
difficult problems almost halves as a consequence. The improvement between
levels 6 and 7 is not so pronounced in JF2.01, but in JF3.0 is quite

(2) At levels 5 and 6, the two versions of JF do equally well in terms of
the number of problems they get right, but version 3.0 has a slight edge,
in that, its mistakes tend to be less significant that version 2.01's.
Hence JF3.0 sacrifices less equity overall than version 2.01. and has a
slightly lower average at these levels. On level 7, however, JF3.0 does
much better than the previous version.

The results for JF3.0 playing on level 7 are well worth pondering. I
obtained them by setting the timing factor to 30 on a Pentium 100MHz
machine. Increasing the tinming factor further did not alter any of JF's
plays on my machine. In each case, JF made its move in just a few seconds.
When I think about how I sweated and scratched my head over these problems,
and still got MANY of them wrong, I am impressed indeed that JF3.0 makes
just 8 mistakes.In his introduction to the book, Woolsey says that anyone
who averages an error of 0.03 or less over these problems must be playing
backgammon at a very high level. Either version of JF, at any of these
three levels is therefore playing at a very high level, but JF3.0, level
7's average of 0.005 seems truly frightening. I'm sure that there's a
lesson somehwere in all of this for those people who complain that JF

In case anyone is interested, the problems which JF3.0, level 7 gets wrong
are the following: 13, 49, 51, 63, 66, 93, 97, 103.

I did some further rollouts on these positions just to see if they agreed
with the results cited in the book. Although some of my results (using
JF3.0, level 5 and 6 rollouts) suggest that the differences between JF's
chosen play and the solution given in the book are not as great as
indicated (in the case of problem 13, the difference seems quite marginal),
I'm satisfied that the above 8 are indeed position in which even JF 3.0
level 7 makes an error.

Finally, just so people without the book don't feel totally left out, here
is what turns out to be JF3.0's biggest error on level 7, Problem 51:

| - - - - - - | - | - - - - - - |
| x   o     o |   |   o     o x |
| x   o     o |   |   o       x |
|           o |   |             |
|           o |   |             |
|           o |   |             |
|             |   |             |
|             |   |             |
|           x |   |             |
|           x |   | x         o |
|         x x |   | x x       o |
| o       x x |   | x x o     o |
| - - - - - - |   | - - - - - - |

Money Game, O owns cube. X to play 66

JF 3.0 level 7 plays 24/18(2), 7/1*(2), the middle way, for an equity of
0.610. The solution, as upported by rollouts (including mine) is to go the
whole hog with 7/1*(2), 8/2(2). Those loose blots in the outfield just give
X too many gammon opportunities to make it worth worrying about defensive
structure with this roll. Still, JF3.0 level 7 is in good company: five out
of the eight experts assembled to vet the problems for the book made the
same mistake (and, for what it's worth, so did I).

I'd be interested to hear any comments on the above results, especially on
the strength of JF3.0, playing at its highest level.

Best Wishes

Nigel Gibbions.

Chuck Bower  writes:

A nice study, IMHO.  A couple things I can add to Nigel's comments:

  a) It looks like JF hasn't improved much from v2.0 to v3.0 at level-5 and
     level-6.  However, at level-7 the improvement is quite noticeable
     in these problems.

  b) One should keep in mind that comparison of JF evaluations with JF
     rollouts is a biased study.  On the other hand, that is the only
     choice MOST of us humans have (as far as using robots is concerned)
     because the only mechanical competitors for JF are either outdated
     (like Expert Backgammon), don't have rollout/evaluation capabilities
     (like TD-Gammon), or are available only to a select few (e.g. their
     authors, like Snowie, M-Loner, Motif, and their cousins).  I'd make
     a plea for new, STRONG, commercially available robots, but it seems
     like someone has already done that on this newsgroup lately.    ;)

  c) The book (Woolsey and Heinrich's "New Ideas...") is not yet outdated!
     That's more than can be said about most BG books a couple years
     after their release.  (Many are outdated BEFORE release!)

        c_ray on FIBS

Kit Woolsey  writes:

Thanks for the excellent analysis.  It should be noted that Jellyfish
probably isn't playing quite as well as these results would indicate.
The reason is that we are using Jellyfish's own rollouts to determine the
best play.  While Jellyfish plays well enough on any level so most
rollouts can be trusted (and Hal and I tried to avoid positions where we
thought there might be problems with the rollouts), there are bound to be
a few positions in the book which Jellyfish misplays badly enough so the
rollouts give erroneous results.  Since the same program is doing the
rollouts and choosing the moves the bias will be in the same direction,
so Jellyfish's opinions are likely to echo any false rollout results.

It should also be noted that the positions in the book are the types of
positions which expert human players have trouble with, while the neural
nets do well on these sort of positions.  They are largely "judgment"
problems, where one has to weigh conflicting conflicting priorities and
come up with the right balance.  This is the area where the neural nets
are very strong.  If we were looking at different types of positions
which were of a more technical nature, human experts would outscore

Despite all this, you are quite correct:  The program plays a damned good
game of backgammon.

