Mechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretability
Jiang Yifei, Nico Almgren, Karima Belkadi, Hester Vandekerckhove
@misc{yifei2025mechanistic,
title = {Mechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretability},
author = {Yifei, Jiang and Almgren, Nico and Belkadi, Karima and Vandekerckhove, Hester},
year = {2025},
howpublished = {Internal release — alphabell index 25/03 · arXiv 2505.18831},
month = {may},
doi = {10.48550/arXiv.2505.89369},
url = {https://dev.alphabell.com/publications/mechanistic-circuits-frontier}
}
Abstract
We adapt mechanistic interpretability tooling to operate at the parameter scale of frontier-class models without quadratic costs in attention-head enumeration. Our core observation: features cluster into ~700 reusable circuits whose composition explains 86% of behaviourally relevant activations on the benchmarks we tested. The methodology is now used as a precondition for the closed-loop interpretability cells paired with every RSI run.
Index metadata
- Cell
- hilbert-13
- Compute
- 92 H100-days
- Status
- Open release
- Tool
- ab-circuits open-source release
- DOI
- 10.48550/arXiv.2505.89369
- arXiv
- arXiv:2505.89369
What this paper is part of
This index entry is part of the Interpretability & alignment research axis. The producing cell — hilbert-13 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-mechanis; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug mechanistic-circuits-frontier.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Jiang Yifei, Nico Almgren, Karima Belkadi, Hester Vandekerckhove. Mechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretability. Internal release — alphabell index 25/03 · arXiv 2505.18831, May 2025. arXiv:2505.89369. doi:10.48550/arXiv.2505.89369.