Researchers from MIT CSAIL have developed PDDL-INSTRUCT, an instruction-tuning framework designed to improve the multi-step planning capabilities of large language models (LLMs). The method combines logical chain-of-thought reasoning with an external plan validator to increase the generation of logically valid plans over plausible but incorrect outputs.
The framework trains models to recognize and explain why a candidate plan has failed. These failures can include unsatisfied preconditions, incorrect effects, frame violations, or an unmet goal. This process is paired with logical chain-of-thought prompts that guide the LLM to perform step-by-step inference over state and action transitions. This produces traceable sequences of state→action→state, written as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩.
For external validation, PDDL-INSTRUCT integrates the VAL plan validator, which checks each step of the generated plan. The validator provides feedback that is either binary (valid/invalid) or detailed, with the detailed feedback resulting in superior performance. The system uses a two-stage optimization process. The first stage penalizes errors in the reasoning chains, and the second stage optimizes for final planning accuracy.
The system was evaluated using the PlanBench benchmark, which includes planning domains known to challenge LLMs, such as Blocksworld, Mystery Blocksworld, and Logistics. In the Blocksworld domain, a tuned Llama-3-8B model achieved a 94% rate of generating valid plans. Previous models had near-zero validity on Mystery Blocksworld, a domain where predicate names are obfuscated to prevent pattern matching. PDDL-INSTRUCT achieved up to a 64-fold improvement in this domain.
Significant performance gains were also recorded in the Logistics domain. Across all test domains, the framework delivered up to a 66% absolute improvement compared to untuned baseline models. Researchers also noted that performance improved with longer feedback budgets and more detailed output from the validator.
The current implementation of PDDL-INSTRUCT applies to classical PDDL domains and depends on the VAL validator as an external oracle. The results show a method for grounding LLM reasoning in formal semantics for use in agent systems that can include a verifier during planning. Extending the framework to handle long-horizon, temporal, numeric, and cost-sensitive planning tasks remains an area for further work.




