Five contestants participated in the First Planning Systems Competition:
The competition seemed to succeed in its goals of generating excitement and establishing how well top-quality planning algorithms actually perform. It did not, however, succeed in finding a clear-cut winner. For details, see the README file in the compressed tar file. This file contains all the problems, solutions, and scoring procedures. We encourage others to run their systems on the problems are compare the results.
Round 1 of the competition consisted of two tracks, labeled "ADL" and "Strips." The difference between the two is that ADL allows context-dependent action effects and quantified preconditions. (For details, see the PDDL manual.) When there are both an ADL version and a Strips version of a domain, there is usually a slight difference between them.
Here is a brief description of each domain (roughly in order of complexity). Within each domain, problems are numbered in approximate order of increasing complexity, although for artificially generated problems it is hard to guarantee that kind of ordering.
Two planners, IPP and SGP, competed in the ADL track. Four, Blackbox, HSP, IPP, and STAN, competed in the Strips track. Problems were drawn from domains Assembly (ADL only), Gripper, Logistics, Movie, Mystery, and Mprime.
The contestants were given two or three days to run their planners on these problems (depending on when they arrived in Pittsburgh). The idea was to allow them to do any last-minute tuning of their planners in Round 1, then do Round 2 without any further tuning. Round 1 ended at 5 PM on Monday, June 8. After much discussion (but, fortunately, no fatalities), the Committee decided to declare IPP the winner of the ADL track, and focus Round 2 on Strips problems only, with all four Strips planners (Blackbox, HSP, IPP, and STAN) as finalists. In addition, we decided to compute statistics for all the systems, but avoid assigning a single number and declaring a winner. For Round 2, we used the Grid, Logistics, and Mprime domains, all in their Strips versions. In Round 1, we had had 140 Strips problems, of which 52 could not be solved by any of the planners. Having established the range they systems could realistically strive for, we deliberately chose a smaller number of problems, closer to that range, for Round 2. The results are given in the following files: round1/results/adl-round1.results round1/results/strips-round1.results round2/results/round2.results As explained in the README file, we were not entirely satisfied with the output, and have provided an alternative view of the data in scoring/uniform-adl-round1.results scoring/uniform-strips-round1.results scoring/uniform-round2.results After this last iteration, it is really hard to declare a winner. IPP solved more problems than any other program and found shorter plans. STAN ran faster than any other on the problems it solved. But HSP solved the most problems in Round 1, using different domains. Blackbox ran fastest in Round 1.
Here are the results for Strips round 1 and round 2:
Note that the planners are sorted in alphabetical order.
It is hard to draw any conclusion from these data, except to note that all of these planners performed very well, compared to the state of the art a few years ago. Many of the plans found were 30 or 40 steps long, and some were longer than 100 steps.