TUPLES Scientific Contribution - Publications
The first objective of TUPLES was to advance the scientific state of the art in Trustworthy AI, with a focus on designing and managing safe, robust, scalable, and explainable planning and scheduling decision-support systems and tools.
The body of scientific publications produced within TUPLES is not only extensive, but also marked by significant quality and international recognition, with contributions featured in some of the most respected journals and conferences in the field.
TUPLES has focused on building trustworthy planning and scheduling systems that are scalable, safe, robust, and explainable—bridging fundamental research with impactful applications.
To achieve these objectives, the cornerstones of our scientific contributions were (1) combining symbolic P&S methods with data-driven methods to benefit from the scalability and modeling power of the latter, while gaining the transparency, robustness, and safety of the former and (2) developing rigorous explanations and verification approaches for ensuring the transparency, robustness, and safety of a sequence of interacting machine learned decisions.
From a practical standpoint, the project demonstrated and evaluated our novel and rigorous methods in a laboratory environment, on a range of use-cases in manufacturing, aircraft operations, sport management, waste collection, and energy management.
During this period, from a scientific standpoint, we continued to:
- advance the state of the art with novel hybrid approaches (model-based / data-driven) for planning and scheduling that aim at increasing the robustness and/or scalability of existing methods;
- developed new methods for verifying, testing, improving or enforcing the safety and robustness of the muti-step decisions recommended by these approaches;
- came up with interactive explanation approaches allowing users of planning and scheduling systems to understand why a solution was recommended over others, and to use this understanding to guide the system towards a solution satisfying their preferences.
Altogether, the project produced 68 scientific publications, over 50 of which were accepted at top-tier AI conferences and journals, with two receiving best paper awards. Many of these contributions were accompanied by publicly released code, ensuring accessibility and reusability by the broader community.
Build robust planning and scheduling systems
We designed novel methods for performing robustness verification for two model classes commonly used to represent action policies: tree ensembles and neural networks.
For tree ensembles, we made three important contributions:
(1) We proposed the first (and to date only) approach that correctly handles verification for multiclass problems in tree ensembles, which is a prerequisite for analyzing planning policies as there are (almost) always more than two actions.
(2) We established a framework for assessing the robustness of tree ensemble predictions at deployment time to identify instances at a heightened risk of being mispredicted due to being, e.g., an adversarial example or out of distribution.
(3) We developed a compression algorithm that yields more robust tree ensembles without harming predictive predictive performance.
These were applied to the Scisports and Beluga use cases. For the OptIT Waste Collection use case we built a simulator that allows for post-hoc robustness evaluation among alternative solutions.
In the context of neural networks, we established worst-case bounds for the complexity of finding adversarial attacks and training robust neural networks. We then extended the work such that the training algorithm can enforce relational properties involving pairs of examples. We developed a new and more robust version of constraint acquisition, which guides the acquisition process using statistical and machine learning methods; a statistical approach to learning planner preferences was adopted for the Airbus Manufacturing use case.
For robust training of neural networks with respect to downstream scheduling and planning systems (decision-focused learning, DFL) we have written a comprehensive survey and benchmark paper, and investigated for the first time the use of DFL for learning to predict action costs in planning. We have also devised novel methods based on black-box differentiation that are capable of addressing two-stage and multi-stage optimization problems under uncertainty and are very scalable at inference time; these have been applied to the energy management use case; we’ve also started to investigate techniques to improve the training time scalability of the methods.
Build safe planning and scheduling systems
We advanced safety for multi-step behavior of ML models representing action policies. We developed two algorithmic approaches for safety verification, predicate abstraction and IC3, for both neural policies and policies represented as tree ensembles. Both of these approaches exploit the multiclass Veritas reasoning tool for tree ensembles that we have developed.
We established policy testing methods, through new fuzzing methods geared at finding undesirable policy behavior, and through new test-oracle methodology able to identify such behavior without having to resort to optimal planning. We established analysis methods that pin-point faults in the policy, i.e., the specific responsible action decisions for undesirable behavior. We established fault fixing methods for tree-ensemble policies, automatically modifying the policy into a new one that is guaranteed to fix a given set of faults. We finally established a debugging loop incorporating all of fuzzing, fault analysis, fault fixing, and safety verification, automatically improving and (if feasible) proving the safety of a given tree-ensemble policy.
All algorithms for multi-step safety verification and testing combine technology and expertise across several TUPLES partners (ANITI, KUL, USaar). Our methods were applied successfully to the Beluga and Flight Diversion use cases, for policy verification on small to medium instances, for policy testing on realistic-sice instances; in Beluga, for some instances our debugging loop automatically turns highly unsafe input policies into output policies verified to be safe.
Build explainable planning and scheduling systems
We designed new interactive methods as well as interactive and visual interfaces based on minimal unsatisfiable subsets (MUS) of solution properties and on their dual – the minimal correction sets (MCS). These help the user navigate the solution space of constraint satisfaction (including scheduling), classical, numeric, and probabilistic planning problems, to explain and resolve the lack of feasible solutions satisfying the user constraints and preferences. These were applied to the 3 Airbus use cases, with (positive) end-user evaluations conducted for two of them.
For large MUSes in systems of constraints that might be incomprehensible for users, we developed a method generating stepwise explanations for why the problem is unsatisfiable. Further, we showed how to exploit constraint symmetries to speed up computations of MUSes. To show contrastive explanations and help users decide among multiple alternative solutions, we developed techniques for Pareto optimal solution enumeration and their interactive exploration, as exemplified in the OptIT Waste use case. We also designed new methods explaining decisions produced by black box and grey box planning policies, including those represented by graph neural networks.
Build scalable planning and scheduling systems
We have developed a range of new architectures for learning generalised policies and heuristics that scale to large planning & scheduling problems, and novel ways of integrating them with search algorithms. This work scales significantly better than the state of the art in deep learning for planning, and rivals or outperforms model-based planners in some domains.
Our work on ML-guided constraint acquisition requires much less user queries, hence allowing us to scale to larger problem instances without exhausting the user. The inference scalability of our hybrid and DFL methods for two-stage and multi-stage optimisation (see O1) is already considerably higher than that of widely employed methods while their training scalability still requires improvement.
Besides hybrid methods, we have also improved existing or built a number of new CP-based scheduling solvers, planners, and heuristics with improved scalability, including their use in the Airbus Manufacturing and OptIT Waste scheduling use cases.
We have also improved scalability of explanation generation for large classes of problems, through the automatic detection and exploitation of symmetries in the MUS computation.
In the context of scalability for tree ensembles, we have made two important advancements. On the one hand, we have developed a novel approach that speeds up robustness checking by an order of magnitude by analyzing previously solved tasks. On the other hand, our approach to compression yields models that are one to two orders of magnitude faster to verify, have a much smaller memory footprint, and need less time to make predictions without harming predictive performance.