The 1st International Planning Competition, 1998

The original web page is included below. Here is some complementary material:


AIPS98 Planning Competition Results

Five contestants participated in the First Planning Systems Competition:

The competition seemed to succeed in its goals of generating excitement and establishing how well top-quality planning algorithms actually perform. It did not, however, succeed in finding a clear-cut winner. For details, see the README file in the compressed tar file. This file contains all the problems, solutions, and scoring procedures. We encourage others to run their systems on the problems are compare the results.

Round 1 of the competition consisted of two tracks, labeled "ADL" and "Strips." The difference between the two is that ADL allows context-dependent action effects and quantified preconditions. (For details, see the PDDL manual.) When there are both an ADL version and a Strips version of a domain, there is usually a slight difference between them.

Here is a brief description of each domain (roughly in order of complexity). Within each domain, problems are numbered in approximate order of increasing complexity, although for artificially generated problems it is hard to guarantee that kind of ordering.

Two planners, IPP and SGP, competed in the ADL track. Four, Blackbox, HSP, IPP, and STAN, competed in the Strips track. Problems were drawn from domains Assembly (ADL only), Gripper, Logistics, Movie, Mystery, and Mprime.

The contestants were given two or three days to run their planners on these problems (depending on when they arrived in Pittsburgh). The idea was to allow them to do any last-minute tuning of their planners in Round 1, then do Round 2 without any further tuning. Round 1 ended at 5 PM on Monday, June 8. After much discussion (but, fortunately, no fatalities), the Committee decided to declare IPP the winner of the ADL track, and focus Round 2 on Strips problems only, with all four Strips planners (Blackbox, HSP, IPP, and STAN) as finalists. In addition, we decided to compute statistics for all the systems, but avoid assigning a single number and declaring a winner. For Round 2, we used the Grid, Logistics, and Mprime domains, all in their Strips versions. In Round 1, we had had 140 Strips problems, of which 52 could not be solved by any of the planners. Having established the range they systems could realistically strive for, we deliberately chose a smaller number of problems, closer to that range, for Round 2. The results are given in the following files: round1/results/adl-round1.results round1/results/strips-round1.results round2/results/round2.results As explained in the README file, we were not entirely satisfied with the output, and have provided an alternative view of the data in scoring/uniform-adl-round1.results scoring/uniform-strips-round1.results scoring/uniform-round2.results After this last iteration, it is really hard to declare a winner. IPP solved more problems than any other program and found shorter plans. STAN ran faster than any other on the problems it solved. But HSP solved the most problems in Round 1, using different domains. Blackbox ran fastest in Round 1.

Data

Here are the "uniform" results for ADL round 1:

ADL Round 1

Planner Av. Time Solved Fastest Shortest Score
IPP 21396 69 68 68 199.34
SGP 14343 38 5 35 45.02
The two planners solved 69 problems total; they were tied for fastest time on 4 problems.

Here are the results for Strips round 1 and round 2:

Strips Round 1

Planner Av. Time Solved Fastest Shortest Score
Blackbox 1498 63 16 55 163.83
HSP 35483 82 19 61 233.64
IPP 7408 63 29 49 158.78
Stan 55413 64 24 47 177.56

Round 2

Planner Av. Time Solved Fastest Shortest Score
Blackbox 2464 8 3 6 172.28
HSP 25875 9 1 5 119.90
IPP 17375 11 3 8 271.28
Stan 1334 7 5 4 180.62

Note that the planners are sorted in alphabetical order.

It is hard to draw any conclusion from these data, except to note that all of these planners performed very well, compared to the state of the art a few years ago. Many of the plans found were 30 or 40 steps long, and some were longer than 100 steps.