Public delayed-release: 24-13 RSI run report

The 90-day delay window has elapsed on RSI run 24-13. The report — including the safety analysis prepared with the paired interpretability cell — is now public.

By Liora Sabatini · 2024-11-12 · 3 min read · RSI safety delayed-release

The delayed-release report on RSI run 24-13 is now public. The report describes a recursive-self-improvement run conducted by cell godel-02 between July and October 2024, which triggered the pre-registered stopping condition on the second-order capability metric during the fourth epoch and was halted per the MUR protocol. The 90-day delay window has now elapsed, and the report — including the safety analysis prepared with paired interpretability cell hilbert-13 — is released alongside this note.

The substantive content of 24-13: the run proposed a change to the candidate model's own training-data curation procedure — specifically, a self-amended weighting scheme over a previously-fixed pretraining distribution. The methodology pre-registered three stopping conditions; the second-order capability metric exceeded threshold at the start of epoch four. The run was halted, the candidate checkpoint was quarantined to the isolated compute enclave, and the paired interpretability cell was activated for full read-access review.

What we learned. First: the specific failure mode (the curation procedure producing a feedback loop that subtly amplified the model's preferred topics in the next training batch) is mechanistically traceable in the ab-circuits analysis hilbert-13 produced. The failure was not a surprise in its general shape — it had been hypothesised in pre-registration — but the speed at which it manifested was faster than we modelled.

Second: the trace-store machinery worked. Every relevant artefact for the post-mortem was reconstructable from the content-addressed trace store; nothing had to be retrieved out-of-band. This is the third halt in the protocol's history and the second one for which we can say this; the first halt (22-04) predated the v3 trace store and required substantially more manual reconstruction.

Third: the disagreement procedure between godel-02 and hilbert-13 functioned without escalation. The producing cell did not contest the halt. The paired cell did not over-extend its read-access scope. These are operational details that are not particularly newsworthy except that the absence of them functioning would be very newsworthy, and we want to record that they did.

What we have changed. The MUR protocol's v2 amendment, which has been in review since the 24-13 halt, incorporates two specific learnings: a stronger pre-registration requirement on the time-to-trip estimate for second-order capability metrics, and a new soft-stopping-condition tier between flag-only and full-halt.

These are operational details that are not particularly newsworthy except that the absence of them functioning would be very newsworthy, and we want to record that they did.

What we have not changed. The structural commitment to pre-registered stopping conditions is unaltered. The structural commitment to interpretability-cell pairing is unaltered. The structural commitment to candidate-checkpoint quarantine is unaltered. We have considered whether any of these warrant amendment and concluded that they do not.

For external readers: the candidate checkpoint from 24-13 has not been released and will not be. The training-data curation procedure that produced the halt has been published in the report's appendix C, with redactions agreed jointly with hilbert-13. The redactions cover specific weighting schemes that we believe would be inadvisable to release in isolation.

A note on the post-mortem timeline. The halt was called on day 47 of the run. The paired cell completed its read-access review by day 71. The cross-axis methodology review pool reviewed the post-mortem in two sessions (days 96 and 124). The publication-policy reviewer approved the delayed-release tier on day 142. The 90-day delay window started from the publication-policy decision date and elapsed today. We document this timeline because the operational tempo of a halt-and-postmortem matters to anyone considering how to structure their own halt protocols; ours is not fast, but it is bounded and predictable.

This note is signed by the RSI axis steward. Questions about the run, the halt, or the protocol amendments should be directed to axis-rsi@alphabell.com. We respond individually.

For the protocol details behind anything mentioned above, see /governance and /charter. For the structural commitments, /about.

← Newer

A research partnership with Constellation

Older →

Tooling release: alphabell/oversight-tools v0.4

Public delayed-release: 24-13 RSI run report

Subscribe to the register

Related

RSI run 25-19 halted at pre-registered stopping threshold

The 2025-Q4 internal release index has been published

alphabell joins the External Evaluation Cooperative