Counterfactual Rollouts for Planning: a 30-day deployment study
Sasha Petrov, Maya Quesada, Bilal Hossain
@techreport{petrov2024counterfactual,
title = {Counterfactual Rollouts for Planning: a 30-day deployment study},
author = {Petrov, Sasha and Quesada, Maya and Hossain, Bilal},
year = {2024},
number = {Internal release — alphabell index 24/22},
institution = {alphabell},
month = {dec},
doi = {10.48550/arXiv.2412.93805},
url = {https://dev.alphabell.com/publications/counterfactual-rollouts-for-planning}
}
Abstract
Counterfactual rollouts — what-if simulations from a world model — are widely used in planning, but their accuracy degrades sharply outside the data manifold. We instrumented two cells' production planners with counterfactual confidence estimates and found that confidence-weighted rollouts yield a 19% reduction in plan-execution failure across 30 days and 11k plan invocations, vs. uniformly-weighted counterfactuals.
Index metadata
- Cell
- voronoi-19
- Compute
- 26 H100-days
- Status
- Open release
- DOI
- 10.48550/arXiv.2412.93805
- arXiv
- arXiv:2412.93805
What this paper is part of
This index entry is part of the World models research axis. The producing cell — voronoi-19 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-counterf; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug counterfactual-rollouts-for-planning.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Sasha Petrov, Maya Quesada, Bilal Hossain. Counterfactual Rollouts for Planning: a 30-day deployment study. Internal release — alphabell index 24/22, Dec 2024. arXiv:2412.93805. doi:10.48550/arXiv.2412.93805.