α · Publications · scalable-oversight-agent-systems

Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach

Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke

Axis Interpretability & alignment

Cell lebesgue-22

Published Sep 2025

Venue Internal release — alphabell index 25/12 · arXiv 2509.04221

Tags interp.

⬇ PDF α arXiv:2509.67033 ⌬ DOI ⌘ Code

BibTeX

@misc{nwosuhoward2025scalable,
  title        = {Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach},
  author       = {Nwosu-Howard, Ifeoma and Tanigawa, Hiroshi and Lotfi, Maral and Wernicke, Ruth},
  year         = {2025},
  howpublished = {Internal release — alphabell index 25/12 · arXiv 2509.04221},
  month        = {sep},
  doi          = {10.48550/arXiv.2509.67033},
  url          = {https://dev.alphabell.com/publications/scalable-oversight-agent-systems}
}

Abstract

We extend debate-based oversight to multi-step agent execution. Each agent action emits a structured trace; two debater agents present competing reconstructions of the trace, and an evaluator (human or weaker model) adjudicates only over the disputed steps. On adversarial environments where the action-taker is up to 6× more capable than the evaluator, debate-plus-trace achieves 91% agreement with ground-truth correctness, vs. 64% for trace-only inspection and 71% for unstructured debate.

Index metadata

Cell: lebesgue-22
Compute: 44 H100-days
Status: Open release
Companion: RSI gate report ab-rsi-014
DOI: 10.48550/arXiv.2509.67033
arXiv: arXiv:2509.67033

What this paper is part of

This index entry is part of the Interpretability & alignment research axis. The producing cell — lebesgue-22 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.

How to read this

If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-scalable; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug scalable-oversight-agent-systems.

Limitations

Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.

Citation

Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke. Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach. Internal release — alphabell index 25/12 · arXiv 2509.04221, Sep 2025. arXiv:2509.67033. doi:10.48550/arXiv.2509.67033.