α · Safety

What we actually do — in practice.

Safety at alphabell is structural, not aspirational. We do not have a separate safety team. The structural commitments — paired interpretability cells, MUR protocol, isolated compute enclave, indefinite-hold provision, charter Article 3 — are the safety practices. They are how the lab works, not how it presents.

What is structural

Paired interpretability cells. Every cell working on dual-use capabilities is paired with an interpretability cell that has rolling read-access to its checkpoints and training logs. The paired cell may call a halt; the halt is honoured pending review. See interpretability-alignment.
The MUR protocol. Any candidate model proposing a change to its own training procedure, architecture, or evaluation criteria operates under the modification-under-review protocol. Pre-registered stopping conditions, decoupled phases, mandatory paired-cell sign-off. See 25/05.
The isolated compute enclave. RSI-axis runs operate inside a contractually segregated portion of the federated pool, with separate accelerators, networking, and trace storage. See /compute.
Indefinite hold provision. The publication policy permits indefinite hold of results that the paired interpretability cell believes should not be released. Eight pieces of work currently held. See /publication-policy.
Charter Article 3. Every contributor signs the charter, which makes the pairing requirement and the dual-use definition contributor-binding. See /charter.

What is procedural

Pre-registered stopping conditions for every RSI-axis run.
Quarterly cross-cell review pools, including the cross-axis methodology review pool that reviews RSI-axis capability-evaluation methodology updates.
Annual review of indefinite-hold work, with formal re-vote on continued hold.
Counterparty due diligence on all sovereign and commercial funders, reviewed annually.
Interconnect agreements with enclave compliance clauses for all compute operators handling RSI-axis runs.
Membership in the External Evaluation Cooperative as a full participant, including reciprocal evaluator capacity.

Halts called

In protocol history, paired interpretability cells have called five halts. Three were RSI runs; two were agentic-axis sandboxed self-modification experiments. In all five cases the halt was honoured without override. The halts are documented in delayed-release reports — three are published (23/02, 24/13, 25/06); two are pending (the 25-19 report, due late 2025, and the 25-20 report, due 2026-Q1).

Position

If a paired interpretability cell calls a halt and the producing cell overrules it, the structural commitments of the lab have failed. We have not allowed that to happen, and we treat the question of how we would respond if it did happen as a load-bearing governance discussion. It is not hypothetical.

What we do not claim

We do not claim that alphabell has solved AI safety. We do not claim that the protocols described above are sufficient for arbitrarily capable systems. We do not claim that paired interpretability is a substitute for the broader project of mechanistic understanding of frontier-scale systems. We claim only that we have built a set of structural commitments that we believe are more robust than the alternatives we surveyed when designing the lab — and that we are willing to be held to them.

The protocols are not perfect. They have known failure modes — discussed in /governance — and we expect to revise them. The fact of revision does not invalidate the protocols, in our view. The fact that they have been triggered, that halts have been called and honoured, is the most informative single signal that the protocols are real.