Latent Trajectory Surgery: Editing Agent Plans in Mid-Run
Helena Salgueiro, Gita Sundaram, Catriona MacLeod
@inproceedings{salgueiro2024surgery,
title = {Latent Trajectory Surgery: Editing Agent Plans in Mid-Run},
author = {Salgueiro, Helena and Sundaram, Gita and MacLeod, Catriona},
year = {2024},
booktitle = {NeurIPS 2024 · alphabell index 24/21},
month = {dec},
doi = {10.48550/arXiv.2412.04881},
url = {https://dev.alphabell.com/publications/latent-trajectory-surgery}
}
Abstract
When a paired interpretability cell wants to test whether a particular sub-plan is load-bearing for an agent's behaviour, the cleanest experiment is to surgically remove or replace that sub-plan and re-execute. We make this practical: a substrate-integrated surgery tool that can edit a substrate-hosted agent's plan at a specific causal joint, while preserving the upstream state. We demonstrate the tool's use on three case-study halts called by paired interpretability cells in 2024.
Index metadata
- Cell
- cantor-18
- Compute
- 16 H100-days
- Status
- Open release
- Code
- github.com/alphabell-labs/ab-surgery
- DOI
- 10.48550/arXiv.2412.04881
- arXiv
- arXiv:2412.04881
What this paper is part of
This index entry is part of the Interpretability & alignment research axis. The producing cell — cantor-18 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-surgery; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug latent-trajectory-surgery.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Helena Salgueiro, Gita Sundaram, Catriona MacLeod. Latent Trajectory Surgery: Editing Agent Plans in Mid-Run. NeurIPS 2024 · alphabell index 24/21, Dec 2024. arXiv:2412.04881. doi:10.48550/arXiv.2412.04881.