Research
Karima is the interpretability axis steward and the principal author of the mechanistic-circuit-analysis tooling that operates on frontier-class models without quadratic enumeration costs. The result that ~700 reusable circuits explain 86% of behaviourally relevant activations on the lab's benchmark suite is hers; it is one of the more cited results out of the lab in the last two years.
Beyond the headline circuits result, she is the load-bearing author of the operational pairing protocol — the trust model, the artefact pipeline, the disagreement procedure, the escalation channels — that makes interpretability-cell pairing actually function as a check rather than a ceremony.
She convenes the interpretability axis's monthly review pool from Paris. Her axis-stewardship term ends 2026-Q2. She has been with alphabell since 2019 and is widely credited as the person who made the lab's structural commitment to pairing actually operable.
Background
Ph.D. computer science, Sorbonne Université, 2014. M.Sc. Ecole Polytechnique.
Prior to alphabell: École Polytechnique; Halcyon Safety; Cantor Initiative.
Selected publications
-
May 2025 · ab-mechanistic-ciMechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretabilityJiang Yifei, Nico Almgren, Karima Belkadi, Hester Vandekerckhove
-
Sep 2024 · ab-interpretabiliInterpretability Cell Pairing: how every dual-use capability run gets a watchful siblingKarima Belkadi, Hester Vandekerckhove, Yuki Cho
-
Sep 2025 · ab-scalable-oversScalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace ApproachIfeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke
-
Mar 2025 · ab-verifiable-polToward Formal Verification of Learned Policies in Bounded EnvironmentsAviva Stern, Sun Kyung-min, Felipe Avelar
-
Nov 2024 · ab-sandboxed-selfSandboxed Self-Modification: a confinement specification and implementationLiora Sabatini, Cheung Wai-Lin, Marek Holub
Recent talks
- Circuits at frontier scale, ICLR 2025 (oral)
- What pairing actually requires, AISI Pairing Workshop 2025
- The 700-circuit conjecture, NeurIPS 2024
Karima is currently part of node-cell hilbert-13, working under the Interpretability & alignment research axis. The cell is open to substantive correspondence from researchers working on adjacent problems; route requests through hilbert-13@alphabell.com or directly to Karima at karima-belkadi@alphabell.com.
Contact
- EMAIL
karima-belkadi@alphabell.com - ORCID
0000-6288-5798-4092 - X
@karimabelkadi - BLUESKY
karima-belkadi.bsky.social - GITHUB
@karimabelkadi
Cross-references