α · Publications · scalable-oversight-agent-systems

Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach

Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke

Axis Interpretability & alignment
Cell lebesgue-22
Published Sep 2025
Venue Internal release — alphabell index 25/12 · arXiv 2509.04221
Tags interp.

Abstract

We extend debate-based oversight to multi-step agent execution. Each agent action emits a structured trace; two debater agents present competing reconstructions of the trace, and an evaluator (human or weaker model) adjudicates only over the disputed steps. On adversarial environments where the action-taker is up to 6× more capable than the evaluator, debate-plus-trace achieves 91% agreement with ground-truth correctness, vs. 64% for trace-only inspection and 71% for unstructured debate.

Index metadata

Cell
lebesgue-22
Compute
44 H100-days
Status
Open release
Companion
RSI gate report ab-rsi-014
DOI
10.48550/arXiv.2509.67033
arXiv
arXiv:2509.67033

What this paper is part of

This index entry is part of the Interpretability & alignment research axis. The producing cell — lebesgue-22 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.

How to read this

If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-scalable; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug scalable-oversight-agent-systems.

Limitations

Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.

Citation

Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke. Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach. Internal release — alphabell index 25/12 · arXiv 2509.04221, Sep 2025. arXiv:2509.67033. doi:10.48550/arXiv.2509.67033.