Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach
Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke
@misc{nwosuhoward2025scalable,
title = {Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach},
author = {Nwosu-Howard, Ifeoma and Tanigawa, Hiroshi and Lotfi, Maral and Wernicke, Ruth},
year = {2025},
howpublished = {Internal release — alphabell index 25/12 · arXiv 2509.04221},
month = {sep},
doi = {10.48550/arXiv.2509.67033},
url = {https://dev.alphabell.com/publications/scalable-oversight-agent-systems}
}
Abstract
We extend debate-based oversight to multi-step agent execution. Each agent action emits a structured trace; two debater agents present competing reconstructions of the trace, and an evaluator (human or weaker model) adjudicates only over the disputed steps. On adversarial environments where the action-taker is up to 6× more capable than the evaluator, debate-plus-trace achieves 91% agreement with ground-truth correctness, vs. 64% for trace-only inspection and 71% for unstructured debate.
Index metadata
- Cell
- lebesgue-22
- Compute
- 44 H100-days
- Status
- Open release
- Companion
- RSI gate report ab-rsi-014
- DOI
- 10.48550/arXiv.2509.67033
- arXiv
- arXiv:2509.67033
What this paper is part of
This index entry is part of the Interpretability & alignment research axis. The producing cell — lebesgue-22 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-scalable; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug scalable-oversight-agent-systems.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke. Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach. Internal release — alphabell index 25/12 · arXiv 2509.04221, Sep 2025. arXiv:2509.67033. doi:10.48550/arXiv.2509.67033.