Sandboxed Self-Modification: a confinement specification and implementation
Liora Sabatini, Cheung Wai-Lin, Marek Holub
@techreport{sabatini2024sandboxed,
title = {Sandboxed Self-Modification: a confinement specification and implementation},
author = {Sabatini, Liora and Wai-Lin, Cheung and Holub, Marek},
year = {2024},
number = {Internal release — alphabell index 24/19 · delayed release},
institution = {alphabell},
month = {nov},
doi = {10.48550/arXiv.2411.13633},
url = {https://dev.alphabell.com/publications/sandboxed-self-modification}
}
Abstract
Self-modification of an agent's code, tool catalogue, or training procedure is the most consequential operation we permit agents to perform. We specify a confinement profile under which such operations may proceed — including jurisdictional segregation of artefacts, mandatory dual-cell sign-off, and rolling read-access for paired interpretability cells — and an implementation in the alphabell substrate. The specification is informed by three internal incidents redacted in the public release; full incident reports are available to long-tenured contributors and external audit partners.
Index metadata
- Cell
- godel-02
- Compute
- redacted
- Status
- Delayed release — 180-day delay; classified appendices not released
- Companion
- Internal incident logs ab-inc-19-{a,b,c}
- DOI
- 10.48550/arXiv.2411.13633
- arXiv
- arXiv:2411.13633
What this paper is part of
This index entry is part of the Agentic engineering research axis. The producing cell — godel-02 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-sandboxe; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug sandboxed-self-modification.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Liora Sabatini, Cheung Wai-Lin, Marek Holub. Sandboxed Self-Modification: a confinement specification and implementation. Internal release — alphabell index 24/19 · delayed release, Nov 2024. arXiv:2411.13633. doi:10.48550/arXiv.2411.13633.