Counterfactual Trajectory Replay for Off-Policy Agent Debugging
Mira Holloway, Priya Anand, Dineth Karunaratne, Akoss Vidor
@misc{holloway2025counterfactual,
title = {Counterfactual Trajectory Replay for Off-Policy Agent Debugging},
author = {Holloway, Mira and Anand, Priya and Karunaratne, Dineth and Vidor, Akoss},
year = {2025},
howpublished = {alphabell index 25/19 · arXiv 2512.00417},
month = {dec},
doi = {10.48550/arXiv.2512.00417},
url = {https://dev.alphabell.com/publications/counterfactual-trajectory-replay}
}
Abstract
Debugging long-horizon agents is hard precisely because the failure that motivates the debug session may have arisen tens of thousands of steps before the visible symptom. We introduce trajectory-replay, a debugging mode built on the alphabell substrate's content-addressed traces, in which a deviation at any prior step can be simulated forward under the same model weights and substrate state. We report on six cell-internal debugging sessions where the technique uncovered root causes that the producing cell had been unable to find through prompt-level inspection, and discuss the operational requirements for replay to be both fast and faithful.
Index metadata
- Cell
- fourier-67 / babbage-14
- Compute
- 22 H100-days (analysis only)
- Status
- Open release
- Code
- github.com/alphabell-labs/ab-replay
- Substrate version
- v1.4+
- DOI
- 10.48550/arXiv.2512.00417
- arXiv
- arXiv:2512.00417
What this paper is part of
This index entry is part of the Agentic engineering research axis. The producing cell — fourier-67 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-replay; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug counterfactual-trajectory-replay.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Mira Holloway, Priya Anand, Dineth Karunaratne, Akoss Vidor. Counterfactual Trajectory Replay for Off-Policy Agent Debugging. alphabell index 25/19 · arXiv 2512.00417, Dec 2025. arXiv:2512.00417. doi:10.48550/arXiv.2512.00417.