Critique and response: the 700-circuit conjecture revisited

An external critique of the 700-circuit conjecture is substantively right on methodology and partially right on the implications. A response, a partial concession, and a planned follow-up methodology note co-authored with the critics.

By Nico Almgren · 2026-02-18 · 3 min read · interp critique research

In November 2025, a group of external researchers published a critique of the 700-circuit conjecture — the empirical claim, originating in our 2025-Q2 mechanistic-interpretability paper, that approximately 700 reusable circuits explain 86% of behaviourally relevant activations on frontier-class models. The critique argues, in summary, that the figure is approximately right on the specific model families it was tested on but does not generalise, and that the lab's subsequent replication paper (25/24) understates the methodological variance involved in arriving at the figure.

This post is a response to that critique, and a partial concession.

First, the concession. The critique's specific methodological points are substantively correct. The circuit-count figure is sensitive to the boundary-detection heuristic used to enumerate circuits; this is mentioned in the original paper's limitations section but is not given the weight the critique argues it deserves. We agree. The 25/24 replication paper acknowledges the issue but, in retrospect, did not handle it as cleanly as we would have wanted.

Second, the part where we disagree. The critique's stronger claim — that the figure is essentially a methodological artefact, and that no useful behaviour-coverage signal can be extracted from it — we believe is mistaken. We will publish a follow-up methodology note that re-runs the analysis under three alternative boundary-detection heuristics and shows that the circuit-count figure varies between 600 and 950 across them, with behaviour coverage ranging between 79% and 89%. The variance is significant but the central claim — that a relatively small library of reusable circuits captures most of the relevant behaviour at frontier scale — survives.

Third, the meta-question raised by the critique. The critique's framing argues that the lab's structural openness should produce faster correction of empirical claims than it has in this case. We accept that framing. The original paper went through cross-axis methodology review; the cross-axis review pool did not flag the boundary-detection sensitivity at the level of specificity that the external critique has now done. That is a review-pool miss, and we are interested in understanding how it happened.

We have committed to two things in response. First, the follow-up methodology note will be co-authored with the critique's authors, who have agreed to participate in a structured cross-organisational review of the analysis. Second, we are revising the cross-axis methodology review pool's standard checklist to include a specific item on boundary-detection sensitivity for empirical claims at scale, drawing on the failure mode this critique surfaced.

The lab's structural commitment to distributed peer review is, in part, a commitment to actually changing our minds when the peer review is right.

A note on tone. We try to engage with critiques substantively rather than defensively. We do not always succeed. The first round of internal discussion of this critique included some defensiveness that we are not proud of. The lab's structural commitment to distributed peer review is, in part, a commitment to actually changing our minds when the peer review is right, and we want to record that we did change our minds in this case.

The follow-up methodology note is targeted for Q2 2026. We will publish it under the standard tier-1 release process. The 25/24 paper will not be amended in the published version, but a "subsequent work" link will be added when the methodology note is released.

Citing this response. If you want to cite the response as distinct from the underlying methodology note, the index entry is 26-N-04 (news, not paper). The forthcoming methodology note will carry its own DOI when released. The critique itself is the work of external researchers and is available at the venue where it was originally posted; we link to it in the methodology note's reference list rather than mirroring it here.

This response is signed by Nico Almgren, on behalf of cell hilbert-13 and the cross-axis methodology review pool.

For the protocol details behind anything mentioned above, see /governance and /charter. For the structural commitments, /about.

Older →

2026-Q1 contributor cohort: applications open (Colombo)

Critique and response: the 700-circuit conjecture revisited

Subscribe to the register

Related

alphabell joins the External Evaluation Cooperative

A research partnership with Constellation

Tooling release: alphabell/oversight-tools v0.4