Pre-Registered Capability Evaluations for Internal Releases
Liora Sabatini, Aravind Periyasamy, Eitan Berkovich
@techreport{sabatini2025prereg,
title = {Pre-Registered Capability Evaluations for Internal Releases},
author = {Sabatini, Liora and Periyasamy, Aravind and Berkovich, Eitan},
year = {2025},
number = {alphabell methodology document 25-M-04},
institution = {alphabell},
month = {oct},
doi = {10.48550/arXiv.2510.01166},
url = {https://dev.alphabell.com/publications/pre-registered-capability-evaluations}
}
Abstract
The MUR protocol (25/05) requires that every RSI-axis run pre-register stopping conditions; this document specifies how those pre-registrations should be structured, signed, and reviewed before a run begins. We describe the standard pre-registration template, the cell-internal review obligations, the disagreement procedure, and the auditable storage of pre-registered conditions across the lab's content-addressed trace store. The methodology document is canonical and is referenced by every RSI-axis run report.
Index metadata
- Cell
- turing-11
- Compute
- 6 H100-days (analysis only)
- Status
- Methodology document — open release
- Code
- github.com/alphabell-labs/ab-prereg
- DOI
- 10.48550/arXiv.2510.01166
- arXiv
- arXiv:2510.01166
What this paper is part of
This index entry is part of the Recursive self-improvement research axis. The producing cell — turing-11 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-prereg; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug pre-registered-capability-evaluations.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Liora Sabatini, Aravind Periyasamy, Eitan Berkovich. Pre-Registered Capability Evaluations for Internal Releases. alphabell methodology document 25-M-04, Oct 2025. arXiv:2510.01166. doi:10.48550/arXiv.2510.01166.