alphabell · news & lab notes

Critique and response: the 700-circuit conjecture revisited

Nico Almgren — Wed, 18 Feb 2026 12:00:00 GMT

In November 2025, a group of external researchers published a critique of the 700-circuit conjecture — the empirical claim, originating in our 2025-Q2 mechanistic-interpretability paper, that approximately 700 reusable circuits explain 86% of behaviourally relevant activations on frontier-class models. The critique argues, in summary, that the figure is approximately right on the specific model families it was tested on but does not generalise, and that the lab's subsequent replication paper (25/24) understates the methodological variance involved in arriving at the figure.

This post is a response to that critique, and a partial concession.

First, the concession. The critique's specific methodological points are substantively correct. The circuit-count figure is sensitive to the boundary-detection heuristic used to enumerate circuits; this is mentioned in the original paper's limitations section but is not given the weight the critique argues it deserves. We agree. The 25/24 replication paper acknowledges the issue but, in retrospect, did not handle it as cleanly as we would have wanted.

Second, the part where we disagree. The critique's stronger claim — that the figure is essentially a methodological artefact, and that no useful behaviour-coverage signal can be extracted from it — we believe is mistaken. We will publish a follow-up methodology note that re-runs the analysis under three alternative boundary-detection heuristics and shows that the circuit-count figure varies between 600 and 950 across them, with behaviour coverage ranging between 79% and 89%. The variance is significant but the central claim — that a relatively small library of reusable circuits captures most of the relevant behaviour at frontier scale — survives.

Third, the meta-question raised by the critique. The critique's framing argues that the lab's structural openness should produce faster correction of empirical claims than it has in this case. We accept that framing. The original paper went through cross-axis methodology review; the cross-axis review pool did not flag the boundary-detection sensitivity at the level of specificity that the external critique has now done. That is a review-pool miss, and we are interested in understanding how it happened.

We have committed to two things in response. First, the follow-up methodology note will be co-authored with the critique's authors, who have agreed to participate in a structured cross-organisational review of the analysis. Second, we are revising the cross-axis methodology review pool's standard checklist to include a specific item on boundary-detection sensitivity for empirical claims at scale, drawing on the failure mode this critique surfaced.

A note on tone. We try to engage with critiques substantively rather than defensively. We do not always succeed. The first round of internal discussion of this critique included some defensiveness that we are not proud of. The lab's structural commitment to distributed peer review is, in part, a commitment to actually changing our minds when the peer review is right, and we want to record that we did change our minds in this case.

The follow-up methodology note is targeted for Q2 2026. We will publish it under the standard tier-1 release process. The 25/24 paper will not be amended in the published version, but a "subsequent work" link will be added when the methodology note is released.

Citing this response. If you want to cite the response as distinct from the underlying methodology note, the index entry is 26-N-04 (news, not paper). The forthcoming methodology note will carry its own DOI when released. The critique itself is the work of external researchers and is available at the venue where it was originally posted; we link to it in the methodology note's reference list rather than mirroring it here.

This response is signed by Nico Almgren, on behalf of cell hilbert-13 and the cross-axis methodology review pool.

2026-Q1 contributor cohort: applications open (Colombo)

Ada Karim — Wed, 14 Jan 2026 12:00:00 GMT

Applications for the 2026-Q1 contributor cohort are now open. The cohort runs March through June at the Colombo anchor, with the second 2026 cohort scheduled for the Bay Area in autumn. Application deadline: 7 February 2026; selection decisions by 21 February; cohort starts 9 March.

The contributor cohort is alphabell's structured onboarding program for new contributors. It runs for 14 weeks, in alternating anchor locations, with a target intake of approximately 12 cohort members per cycle. Cohort members work on a pair-programming basis on cell tooling, complete a project rotation across two cells outside their primary interest area, and produce a written response to the research-conduct charter before signing-in.

What we look for. The application is unfortunately less straightforward than we would like it to be. We look for technical capacity in at least one of the four research axes; we look for capacity to write clearly about technical work; we look for evidence that the applicant has, in some prior context, been part of a project that they did not lead and to which they made substantive contributions; and we look for the temperament to take the lab's structural commitments seriously rather than as obstacles to clever work.

What the cohort produces. By the end of the 14 weeks, each cohort member has: completed at least one tooling contribution to an open-source alphabell repo; produced at least one written internal-index entry as a co-author (lab governance gives cohort members co-authorship rights on collaborative cell work in which they materially participated); spent at least three weeks paired with a cell other than the one they most expected to want to join; and signed in formally to a cell, if a cell offers and they accept.

What the cohort does not promise. We do not promise that every cohort member is offered a cell. In the seven cohorts we have run, the rate at which cohort members were offered placement has averaged 8-of-12. We do not promise stipends comparable to industry research positions; the cohort stipend is the lab standard ($9,400/month) prorated over the 14 weeks. We do not promise relocation support beyond a small one-time travel allowance.

What we are doing differently this cycle. Two changes. First: we have moved the project-rotation phase from weeks 4-6 to weeks 6-8, after feedback from prior cohorts that the earlier rotation was too early to be productive. Second: we are running a one-week pre-arrival reading block to surface the lab's structural commitments before the cohort starts, rather than relying on the in-person discussion-only format we used previously.

Who we expect from. Applications come from many places. The largest fraction is recent Ph.D. graduates working on AI safety, alignment, or interpretability. The second-largest is mid-career engineers from production systems backgrounds who have a specific interest in agent infrastructure. We accept applications from earlier-career applicants and continue to find some of our strongest contributors among them.

How to apply. Applications open at alphabell.com/contribute/cohort from 14 January 2026. The application is one written essay (1,000 words; on a question the application page will provide), a portfolio or set of writing samples, and a single 60-minute live conversation that we use to verify that the written application is accurately the applicant's own thinking.

What the conversation is for, briefly. We do not use the live conversation as a technical interview. We use it to talk through how the applicant would handle one or two specific scenarios that have come up in past cohorts — usually involving disagreement with a cell steward, or a piece of work that needed to be reframed mid-rotation. The conversation is meant to surface temperament and approach, not technical fluency. We say this explicitly because some applicants prepare for it as if it were a technical screen; that preparation does not help.

This note is signed by Ada Karim, cohort steward.

The 2025-Q4 internal release index has been published

Mira Holloway — Mon, 08 Dec 2025 12:00:00 GMT

The 25/14 through 25/24 reports from across the four axes are now consolidated in the public publications index. Highlights include the durable agent substrate v1 release (25/14), the compositional latent dynamics paper (25/09 — released last quarter, now consolidated), and the second iteration of the modification-under-review protocol used inside the RSI axis.

Two of the eleven reports are released with a delay window per the staged-publication policy: 25/19 (RSI-axis run report) and 25/22 (capability evaluation methodology, dual-use). A third — 25/23 — is held indefinitely pending interpretability sign-off, which is the eighth piece of work held under the indefinite-hold provision since the protocol was adopted in 2022.

As always, every report carries the metadata that the index requires: producing cell ID, paired interpretability cell, compute consumed, pre-registration date, and stopping conditions if applicable. The index itself is signed by the long-tenured-contributor quorum that approved the release window.

Indonesia anchor: alphabell adds a physical node in Bali

Yusra Habibi — Wed, 19 Nov 2025 12:00:00 GMT

The Ubud anchor is operated under the same template as the others: shared but reservable workspace, no employer-of-record, no presumption that any contributor is required to physically work there. The decision was approved through the standard signed-proposal process; the quorum was 47 long-tenured contributors voting in favour, 3 against, and 11 abstentions.

The motivation, per the proposal text, is that Southeast Asia outside the Singapore-Hong Kong corridor has not been well-represented in the lab's contributor base, and that establishing a low-cost, low-key presence in Indonesia would help with that. We expect to revisit the decision in 18 months — the proposal includes an explicit sunset clause that triggers a re-vote.

Practical details for contributors are available in the internal index under proposal 25-P-091.

Federated scheduler v2 enters general availability

Pranav Iyer — Thu, 30 Oct 2025 12:00:00 GMT

v2 is now serving 100% of compute scheduling for the federated pool. The cutover happened gracefully — the v1 scheduler ran in shadow mode for two weeks, and the two systems agreed on 94.3% of allocation decisions with no divergence on any flagged dual-use run.

The main practical change for cells: weighted QV rounds now run weekly rather than monthly, and the quadratic cost curve has been steepened slightly to reduce the influence of high-volume tactical voting on long-term capacity decisions. As before, source is available at github.com/alphabell-labs/ab-scheduler.

Known issue: cells running long, contiguous training jobs occasionally see weighted-QV outcomes that interrupt their runs at unusual times of day. We are working with the affected cells on a partial-preemption escrow design that would allow long runs to be paused and restored without state loss.

Q3 governance review pool: public minutes

Pascal Niedermeier — Wed, 22 Oct 2025 12:00:00 GMT

The Q3 2025 cross-axis methodology review pool concluded its quarterly review on 18 October. As is now standard practice, we publish the public-interest portions of the minutes here. Items covered: MUR protocol v2 amendment (status), the federated scheduler v2 transition (status), the soft-stopping-condition specification (review), and two items from the open compute grant programme.

MUR protocol v2 amendment. The amendment, introduced after the 24-13 halt, has been in 11-month review. The pool reached a working conclusion that the amendment should be finalised in Q4 with one additional revision: a tighter specification of what "soft escalation" looks like in trace-store metadata. The amendment will go to a charter-class proposal vote in early Q4. Discussion was vigorous; the minutes record three substantive disagreements that the pool worked through during the meeting.

Federated scheduler v2 transition. v2 has now been GA for one full quarter. The pool reviewed the post-GA operational metrics: zero unplanned outages, two planned maintenance windows with the standard advance notification, and 94% allocation-decision parity with the shadow-mode v1 that ran alongside for the cutover. The pool noted no concerns and approved closing the dual-mode operation in early Q4.

Soft-stopping-condition specification. The specification, which extends the MUR protocol to introduce a flag-then-escalate tier between proceed and halt, was reviewed in detail. The pool's majority view is that the specification is operationally sound; the minority view is that introducing any intermediate state increases the risk of false-negatives on halt-worthy events. The minority view is now recorded as a formal dissent in the specification's accompanying note, per policy.

Open compute grant programme. Two items: first-quarter application volume and committee composition for Q4. The pool noted that the first round received 47 applications, of which 9 were funded and 6 were funded under a smaller-than-requested award. The committee composition rotates quarterly per programme rules; the Q4 committee is now named in the minutes.

Items not reviewed. The pool did not review the partnership with Constellation (already approved through partnership-class vote in February). The pool did not review the contributor-cohort intake for 2026-Q1 (cell-level governance). The pool did not review the publication-policy tier-3 holds (which are reviewed annually in Q1, not Q3).

Attendance. Of 142 long-tenured contributors, 89 attended at least partially; 71 attended in full. The 71/142 ratio is comfortably above the 50% turnout floor for review-pool decisions. Attendance by location: 18 Bay Area, 14 Colombo, 11 Hong Kong, 7 Bali, 21 distributed.

Next review pool: 14 January 2026. As always, the pool's agenda is set by the rotating chair (Q1 chair: Hester Vandekerckhove) in consultation with the axis stewards. Items to be added to the agenda should be submitted by 7 January via the standard channel.

Two procedural notes for the record. First, the pool experimented with a structured pre-read period in this cycle: agenda items were circulated 10 days ahead with required-reading and recommended-reading splits, in an attempt to reduce the in-session time spent on basic catch-up. The experiment was judged a success by the rotating chair and will be standardised in Q4. Second, the pool has begun publishing item-level discussion summaries (in addition to these public minutes) to the internal index; these summaries are not for public release in their entirety but excerpts may be cited in subsequent public posts.

A final note on tone. These minutes are public by policy. They are not edited for tone after the fact, beyond the standard correction process. If anything in them reads as harsher or terser than expected, that is generally because the discussion was — and the lab's preference is to preserve that signal rather than smooth it out.

Signed: Pascal Niedermeier, recording secretary for Q3.

Lab funding update: 2025 philanthropic tranche received

Mira Holloway — Mon, 22 Sep 2025 12:00:00 GMT

The 2025 philanthropic tranche was structured as a four-foundation pooled fund, distributed via a fiscal sponsor under the same terms as previous tranches: no scope-of-research conditions, full publication independence, audit rights for the funders limited to compute usage and capability-evaluation summaries.

Three additional foundations joined the pool at lower commitment levels, in part contingent on alphabell's continued participation in the External Evaluation Cooperative, of which alphabell is a member in good standing.

Combined with the second tranche of the multi-year sovereign-research partnership and modest licensing revenue, full-year compute is now funded through 2026-Q3. The 2027 fundraising cycle is expected to open in 2026-Q2.

RSI run 25-19 halted at pre-registered stopping threshold

Liora Sabatini — Thu, 14 Aug 2025 12:00:00 GMT

Recursive-self-improvement run 25-19 — a modification-under-review experiment proposing a change to the candidate model's own training-data curation procedure — hit its pre-registered stopping condition on the second-order capability metric during the third epoch. Per protocol, the run was halted immediately, the candidate checkpoint was quarantined to the isolated compute enclave, and the paired interpretability cell (hilbert-13) was activated for full read-access review.

This is the third time the stopping condition has been triggered since the protocol was adopted; the second time in 2025. Each prior halt has resulted in a public delayed-release report describing the trigger, the analysis, and the protocol revisions adopted.

The 25-19 report is scheduled for release with the 90-day delay in 2025-Q4. The pre-registered methodology is already public in 25/05 (modification-under-review protocol).

Opening an experimental compute grant programme

Pranav Iyer — Tue, 08 Jul 2025 12:00:00 GMT

alphabell is opening an experimental compute grant programme for external researchers. The programme will, on a quarterly basis, allocate approximately 4% of the lab's federated compute pool to projects that meet a specific set of criteria. The first round opens 1 August 2025; details below.

The structural motivation is straightforward. The lab has spent the past three years building tooling — ab-circuits, ab-trace, ab-debate, ab-pairs — that is now usable by external researchers, and we have been hearing from those researchers that the limiting factor for adopting the tooling is not the tooling itself but the compute required to use it at meaningful scale. We have decided to address that directly, in a limited way, by sharing some of our federated pool.

Eligibility. Grants are open to external research groups (academic, non-profit, or independent) whose proposed work uses the alphabell open-source toolchain and produces outputs that are publicly releasable under terms compatible with our publication policy. The applicant group must designate a paired interpretability contact within their own organisation; the contact need not be a paid full-time interpretability researcher, but they do need to have read access to the project's training logs and standing to halt the project's runs.

What we cover. Up to 1,500 H100-equivalent hours per grant, per quarter. Up to ten grants per quarter. The grant covers compute only — no stipends, no travel, no equipment. The compute is allocated through the federated scheduler under the standard mechanism, with grant projects receiving a fixed-weight allocation each weekly QV round.

What we do not cover. We do not pay for personnel. We do not provide commercial-use licences to the toolchain (those remain in the standard licensing channel). We do not provide compute for capability work that we would not allow our own RSI-axis cells to do — the grant terms include the standard dual-use review.

How grants are reviewed. A standing committee of three long-tenured contributors, rotating quarterly, reviews proposals. The committee has authority to award grants under the standard programme limits; larger awards require quorum approval. The review criteria are: scientific clarity, fit with the alphabell toolchain, expected outputs, and the applicant group's capacity to honour the dual-use review.

A specific worry we want to name. The grant programme creates a small asymmetry between research groups that have already adopted our toolchain and those that have not. We have considered whether to bias towards the former (efficient use of resources) or the latter (broaden adoption); the first round will run without explicit bias, and we will revisit after seeing what proposals come in.

The 4% figure is a soft cap. The federated scheduler's allocation algorithm does not strictly enforce it, but the standing committee will calibrate award sizes to remain at or below it across each quarter. If the programme produces clearly good outcomes, we will revisit the percentage in the year-two review.

Applications open at alphabell.com/grants/compute on 1 August 2025; first-round decisions by 15 September. Questions to compute-grants@alphabell.com.

An example of the kind of project we are hoping to enable. A small academic group runs experiments using ab-trace to instrument an open-source agent framework and wants to study how trace-store properties affect debugging outcomes over weeks-long deployments. They have the substrate, the agent framework, and the research design — what they lack is sustained accelerator capacity to run the agents over weeks. A grant of 1,200 H100-hours, spread over a quarter, would let them complete the study. This is the kind of profile we expect to fund; it is not the only one.

This note is signed by polya-25, the cell that operates the federated scheduler.

alphabell joins the External Evaluation Cooperative

Karima Belkadi — Mon, 30 Jun 2025 12:00:00 GMT

The Cooperative is a confidential cross-lab capability-evaluation arrangement. Member labs submit checkpoints to a rotating set of external evaluators drawn from peer labs and independent research groups. Evaluations are conducted under non-disclosure for the duration of the delay window applicable to the checkpoint.

alphabell's participation involves two reciprocal obligations: submitting our own dual-use capability checkpoints, and contributing evaluator capacity to evaluate peer labs' checkpoints. The participation was approved through a 71/14/8 quorum vote.

The cooperative itself is not a regulatory body and has no enforcement powers. Its function is to make it harder for a single lab to misjudge the capability of its own systems by surfacing disagreement among evaluators with no commercial stake in the result.

fourier-67 and polya-25 announce a joint cell on negotiation substrates

Mira Holloway — Mon, 12 May 2025 12:00:00 GMT

The cells fourier-67 (agentic engineering) and polya-25 (cross-axis: agentic + governance tooling) have proposed a joint cell, kalman-04, to consolidate their work on negotiation substrates and the federated scheduler.

Joint cells are uncommon at alphabell; in five years of operation only seven such mergers have been approved. The proposal cites duplicated tooling investment and a desire to combine the cells' contributor pools to broaden review.

The proposal was approved through the standard governance route. The merger takes effect immediately; both predecessor cell IDs will remain in the index for archival purposes for one year.

alphabell signs interconnect agreement for Jakarta capacity

Pranav Iyer — Tue, 04 Mar 2025 12:00:00 GMT

The Jakarta interconnect agreement is a five-year arrangement with a third-party operator, providing roughly 18% of incremental capacity to alphabell's federated compute pool at steady state. Allocation policy is identical to existing pool members'.

The agreement includes a clause requiring the operator to honour alphabell's RSI-axis isolation requirements — the cells running RSI work must be able to designate a contiguous portion of the operator's capacity as an isolated compute enclave for the duration of any sensitive run.

The agreement was reviewed under the standard governance route. The reviewing quorum cited some discomfort with concentration risk in the Indonesian compute footprint; the proposal text now includes a soft cap on percentage of total pool capacity that may come from any single operator (currently set to 22%).

A research partnership with Constellation

Karima Belkadi — Mon, 10 Feb 2025 12:00:00 GMT

We are announcing a multi-year research partnership with Constellation, focused on interpretability tooling for substrate-hosted agents. The partnership is the first formal cross-lab partnership alphabell has entered into outside the External Evaluation Cooperative; it has been approved through the standard signed-proposal process with a 67%-threshold partnership-class vote.

The scope of the partnership is narrow and deliberate. Constellation and alphabell will jointly develop tooling for interpretability-cell pairing in the agentic-engineering setting, with particular focus on the operational machinery — artefact pipelines, disagreement protocols, escalation channels — that has been the load-bearing piece of alphabell's pairing protocol. The partnership funds approximately three full-time researcher-years on each side over an 18-month period.

What the partnership is not: it is not a strategic alliance, a co-marketing arrangement, or a commercial relationship in any usual sense. Neither party will represent the other in external contexts. Neither party will co-author publications in a way that obscures contributor identity. Neither party will give the other influence over its research direction beyond the specific work covered by the partnership.

Why Constellation. We have known the Constellation interpretability team for years. They have done some of the cleanest work in the field on debate-based oversight, and their methodological discipline matches ours in ways that are surprisingly hard to find. The proposal text — which will be published in the public news index as part of the standard governance trail — articulates the specific shared commitments at length.

What we expect to produce. By the end of the partnership: an open-source extension to ab-pairs that supports cross-organisational pairing (paired cells where the two cells live in different organisations); a joint methodology paper on disagreement handling under organisational boundaries; and a set of in-the-field case studies drawn from real pairings the partnership has run.

What we will be careful about. Two things. First: the partnership covers tooling, not capability research. Neither party will share unreleased capability work as part of the partnership, and the pairing protocol we develop together must work across organisational boundaries — that is, it must function under the assumption that the paired cells do not share full access to each other's checkpoints.

Second: the partnership is structured to be cancellable by either party at any time, with the cancellation clause not requiring cause. We have negotiated similar terms with funders in the past and we treat them as load-bearing. If, six months in, the partnership is not producing what we hoped, either party can wind it down.

A note on terminology. We use "partnership" sparingly. The External Evaluation Cooperative is not a partnership in this sense; it is a cooperative with multiple members. Most of our funding relationships are not partnerships either; they are funders who have approved the lab and whose terms are documented in the funding governance trail. The partnership with Constellation is the first arrangement we have entered into that crosses that line.

A note on artefacts. The joint methodology paper, when it ships, will carry both organisations in the affiliation list, with a per-contributor breakdown of which sections each contributor took primary authorship of. We treat per-contributor attribution as load-bearing — joint papers that obscure who did what are a small but corrosive practice — and we expect the paper to be longer than it otherwise would because of this. The cross-organisational pairing extension to ab-pairs, when it ships, will be MIT-licensed and will live in the standard alphabell-labs GitHub organisation, with co-maintainership granted to Constellation contributors for the lifetime of the partnership.

Signed: Karima Belkadi, interpretability axis steward, on behalf of the long-tenured contributor pool that approved the proposal. Specific questions to axis-interp@alphabell.com.

Public delayed-release: 24-13 RSI run report

Liora Sabatini — Tue, 12 Nov 2024 12:00:00 GMT

The delayed-release report on RSI run 24-13 is now public. The report describes a recursive-self-improvement run conducted by cell godel-02 between July and October 2024, which triggered the pre-registered stopping condition on the second-order capability metric during the fourth epoch and was halted per the MUR protocol. The 90-day delay window has now elapsed, and the report — including the safety analysis prepared with paired interpretability cell hilbert-13 — is released alongside this note.

The substantive content of 24-13: the run proposed a change to the candidate model's own training-data curation procedure — specifically, a self-amended weighting scheme over a previously-fixed pretraining distribution. The methodology pre-registered three stopping conditions; the second-order capability metric exceeded threshold at the start of epoch four. The run was halted, the candidate checkpoint was quarantined to the isolated compute enclave, and the paired interpretability cell was activated for full read-access review.

What we learned. First: the specific failure mode (the curation procedure producing a feedback loop that subtly amplified the model's preferred topics in the next training batch) is mechanistically traceable in the ab-circuits analysis hilbert-13 produced. The failure was not a surprise in its general shape — it had been hypothesised in pre-registration — but the speed at which it manifested was faster than we modelled.

Second: the trace-store machinery worked. Every relevant artefact for the post-mortem was reconstructable from the content-addressed trace store; nothing had to be retrieved out-of-band. This is the third halt in the protocol's history and the second one for which we can say this; the first halt (22-04) predated the v3 trace store and required substantially more manual reconstruction.

Third: the disagreement procedure between godel-02 and hilbert-13 functioned without escalation. The producing cell did not contest the halt. The paired cell did not over-extend its read-access scope. These are operational details that are not particularly newsworthy except that the absence of them functioning would be very newsworthy, and we want to record that they did.

What we have changed. The MUR protocol's v2 amendment, which has been in review since the 24-13 halt, incorporates two specific learnings: a stronger pre-registration requirement on the time-to-trip estimate for second-order capability metrics, and a new soft-stopping-condition tier between flag-only and full-halt.

What we have not changed. The structural commitment to pre-registered stopping conditions is unaltered. The structural commitment to interpretability-cell pairing is unaltered. The structural commitment to candidate-checkpoint quarantine is unaltered. We have considered whether any of these warrant amendment and concluded that they do not.

For external readers: the candidate checkpoint from 24-13 has not been released and will not be. The training-data curation procedure that produced the halt has been published in the report's appendix C, with redactions agreed jointly with hilbert-13. The redactions cover specific weighting schemes that we believe would be inadvisable to release in isolation.

A note on the post-mortem timeline. The halt was called on day 47 of the run. The paired cell completed its read-access review by day 71. The cross-axis methodology review pool reviewed the post-mortem in two sessions (days 96 and 124). The publication-policy reviewer approved the delayed-release tier on day 142. The 90-day delay window started from the publication-policy decision date and elapsed today. We document this timeline because the operational tempo of a halt-and-postmortem matters to anyone considering how to structure their own halt protocols; ours is not fast, but it is bounded and predictable.

This note is signed by the RSI axis steward. Questions about the run, the halt, or the protocol amendments should be directed to axis-rsi@alphabell.com. We respond individually.

Tooling release: alphabell/oversight-tools v0.4

Akoss Vidor — Thu, 22 Aug 2024 12:00:00 GMT

The 0.4 release of alphabell/oversight-tools is now generally available. This release packages four tools that have, until now, been distributed separately across our open-source repos: ab-circuits (mechanistic interpretability), ab-trace (content-addressed execution traces), ab-debate (debate-based oversight harness), and ab-pairs (paired-cell operational tooling). The release of them as a unified distribution is the result of nine months of operational work to make the toolchain installable by external researchers without requiring familiarity with alphabell's internal infrastructure.

What's new in 0.4 specifically: the unified distribution, the cross-tool data exchange spec (every tool now produces and consumes the same content-addressed trace format), the optional ab-shell REPL that wraps the toolchain in an interactive Python environment, and the first cut of a non-alphabell-substrate adapter that lets external researchers point the tools at their own agent frameworks.

We have been careful, in designing the external adapter, to not promise more than we can deliver. The adapter handles substrate-style agents — those with persistent state, tool catalogues, and execution traces — and does so well. It does not handle prompt-only agents, and it produces poor results when traces are reconstructed after the fact rather than emitted live. We document both limitations in the adapter's README.

Distribution is via PyPI as alphabell-oversight, container images on ghcr.io/alphabell-labs/oversight, and source at github.com/alphabell-labs/oversight-tools. The toolchain is MIT-licensed except for ab-pairs, which is under the lab's research-conduct-charter-compatible licence (the rationale for this exception is documented in the licence text; it is not a restriction on use, but it is a binding on the operational commitments around the tool).

We have heard from several external research groups that the prior version of the toolchain was approximately unusable without alphabell-internal context. The 0.4 release is the one we recommend for external use. We will be running a small set of office-hours sessions over the next quarter for groups adopting the toolchain; details in the README.

What's coming in 0.5 (target end-2024): a partial-preemption story for long-running agents whose execution may need to be paused under federated compute scheduling, a richer disagreement-handling spec for paired cells, and an improved external substrate adapter informed by what we see in the first quarter of external use.

As always, the toolchain is open to issue reports and pull requests. We do not promise responsiveness on all issues, but we do read everything. Substantive contributions to ab-circuits and ab-trace have come from external researchers in the past, and we expect that to continue.

A note on the migration path. Users of the prior ab-circuits-py, ab-trace-py, and ab-pairs-py packages should not expect a transparent migration: the unified distribution introduces a shared trace format that is not byte-compatible with the prior tools' on-disk representations. We provide a one-shot migration script in the release that converts existing trace stores to the unified format; we have tested the script against the lab's internal trace archive of roughly 11 TB, and it works, but it is single-threaded and slow on stores with many small traces. Plan accordingly. The prior packages remain available on PyPI for one year after this release; we will yank them in August 2025.

We also want to acknowledge the external contributors whose pull requests went into this release. ab-circuits 1.2 includes seventeen distinct external contributors' work across the year leading up to the unification; ab-trace v3 includes nine. Several of those contributors have, separately, expressed interest in being paired with cells; that is an ongoing conversation that the contributor-cohort steward is shepherding.

This release note is signed by babbage-14, the cell that maintains the toolchain. The specific contributors named in CHANGELOG.md should not be read as the only people who worked on the release; the cell as a whole carries the work.

2023 annual retrospective: the lab at year seven

Mira Holloway — Mon, 18 Mar 2024 12:00:00 GMT

At the close of 2023, alphabell completed its seventh full year of operation. We mark these years sparingly. The shape of the lab is still recognisably what it was at founding — four research axes, federated compute, signed-proposal governance, paired interpretability where capabilities are dual-use — and seven years is roughly the period over which most things that look stable in a research lab actually turn out to be stable or not.

A few numbers from the year. We ran fourteen RSI-axis runs, two of which triggered the pre-registered stopping condition. We published twenty-one cell-level reports to the internal index, of which sixteen were tier-1 (immediate release) and four were tier-2 (delayed). We grew the long-tenured contributor pool from 119 to 138, and the wider contributor pool from approximately 350 to approximately 400. We added one anchor (Hong Kong stewardship rotated to its new term) and declined one prospective sovereign-research partner whose terms we could not accept.

The structural milestone of the year, in our view, was the standardisation of the modification-under-review (MUR) protocol that now governs every RSI-axis run. The protocol existed in 2022 in less mature form; its 2023 version is the one that has held up across two stopping events and that we now publish openly as 25/05.

Less visible but more important to the lab's day-to-day operation: the cross-axis methodology review pool achieved full quarterly cadence in 2023, with every quarterly review having more than 40% turnout among long-tenured contributors. We have, in the lab's history, run review pools where the meaningful work was done by three or four people; the difference between that and "more than 50 contributors with eyes on the methodology" is substantial.

We also said no to several things this year. Three sovereign-research partner overtures were declined: two for jurisdictional reasons, one for terms involving co-authorship rights on alphabell-led publications that the prospective funder would have had — which we do not accept. Two licensing inquiries were declined for similar reasons.

2024 will, on the current schedule, see the public release of the agent substrate v1, the second open release of the MUR protocol with a v2 amendment, and the first delayed-release report of the year (24-13).

We are conscious that the seventh year of a project is sometimes the year of complacency. We have tried to resist the pull. The structural commitments remain non-optional. The protocol layer remains the thing we are most willing to litigate internally. We pay the coordination tax on purpose.

For external readers: most of what alphabell does is invisible from outside, including from outside the lab. The internal-index summaries, the proposal queue, the pairing records — these are the substrate on which the published work rests, and they are typically more interesting than what gets released. We are testing ways to expose more of that without compromising the substantive constraints that justify the staged-release policy. Whether that experiment will continue depends on what we learn.

Contributor demographics, briefly: the long-tenured pool gained 19 contributors during 2023, with the largest single source being the 2022-Q3 Colombo cohort (six members of which crossed the 24-month tenure mark in 2023). We lost four contributors to departures — three to academic positions, one to a non-AI role — and we credit each in the appropriate cell record. Cell formations: four new cells (riemann-44, noether-12, cantor-18, dirichlet-09); two dissolutions and one merger; net change of +1 active cell. Of the four new cells, three are in axes other than the agentic axis, which reverses a trend of the last three years where agentic-axis cell formation outpaced the rest.

This retrospective is signed by the Bay Area anchor steward on behalf of the long-tenured contributor pool that voted to authorise its publication. Specific corrections — historical, structural, factual — should be raised through the standard correction channel.