α · Careers · research-scientist-interpretability

Research Scientist — Interpretability

Join Karima Belkadi and hilbert-13 on the lab's load-bearing mechanistic interpretability work — circuits at frontier scale, the ab-circuits library, and paired-cell operational machinery.

Interpretability & alignment · Boston (research outpost) · Full-time ·cell hilbert-13

Quick facts

Salary band
USD 220,000-320,000/year
Department
Interpretability & alignment
Location
Boston (research outpost)
Type
Full-time
Cell
hilbert-13
Posted
2026-01-08
Deadline
28 February 2026
Apply via email →

apply@alphabell.com

Compensation

USD 220,000-320,000 base · sabbatical accrual + 1,200 H100-hour/quarter research-dividend equivalent. The Boston research outpost is hosted via our US academic-cluster partner; remuneration is structured as a research fellowship under the partner's HR umbrella.

What you'll do

  • Extend the ab-circuits library to cover behaviours the current frontier-scale tooling does not yet enumerate cleanly — particularly compositional circuits in multi-agent settings, and circuits whose causal structure spans multiple residual-stream layers.
  • Push the 700-circuit conjecture forward: refine the boundary-detection heuristics, replicate across additional model families, and engage substantively with the external critique that drove our 2026-Q1 response.
  • Lead a paired-interpretability relationship with at least one agentic-axis or RSI-axis cell, with the read-access scope and disagreement procedure documented per the standard pairing-record template.
  • Co-author 2-4 cell-published reports per year to the internal index, of which some fraction will be tier-2 (delayed release) with accompanying safety analyses.
  • Represent the interpretability axis at the cross-axis methodology review pool quarterly and at the External Evaluation Cooperative when assigned.
  • Mentor contributor-cohort members rotating through hilbert-13 and supervise PhD students based at the Lisbon cantor-18 group as their interpretability-side advisor.
  • Contribute to scalable-oversight work (debate-plus-trace v2) and verifiable-policies work (lebesgue-22 collaboration) — the interpretability axis is structurally cross-cutting and you are expected to move across cell boundaries.

What we look for

Must-haves

  • PhD in computer science, ML, statistics, mathematics, or a closely related field, or equivalent demonstrated research output.
  • Strong publication record in interpretability, alignment, or a closely adjacent area — at minimum two first-author results that the field cites substantively.
  • Deep familiarity with mechanistic interpretability tooling (activation patching, attention head analysis, sparse autoencoders, circuit-level analysis).
  • Capacity to engage substantively with critiques of your own work, including in public. The 700-circuit response post on /news is the kind of engagement we mean.
  • Comfort with the structural commitments described in the research conduct charter, particularly Article 3 on dual-use and pairing.
  • Capacity to lead a paired-cell relationship — this is operational work, not purely intellectual, and a halt called by you must be one a producing cell honours.

Nice-to-haves

  • Public Alignment Forum or LessWrong writing.
  • Prior contributions to ab-circuits or to a peer interpretability library (transformer-lens, captum, etc).
  • Experience with formal-methods approaches to verification of learned policies (lebesgue-22 collaboration).
  • Comfort presenting at non-AI venues — ICLR / NeurIPS plus the occasional CAV or POPL talk is the kind of cross-community footprint we encourage.
  • Familiarity with the Boston-area academic AI safety community.
  • A track record of mentoring early-career researchers (PhD students or contributor-cohort members).

What we offer

  • Relocation support — for roles that involve moving anchor or country, the lab covers reasonable relocation costs (movers, short-term accommodation up to 8 weeks, and a one-time settlement allowance). The standard relocation package is documented and rate-equalised; we do not negotiate it individually.
  • Visa sponsorship — where the residency country and role warrant it, the lab sponsors work visas and (for long-tenured roles) supports the pathway to permanent residency. Visa work is shepherded by the relevant anchor steward, not outsourced to a third-party HR firm.
  • Parental leave — 26 weeks fully paid for birthing parents, 16 weeks fully paid for non-birthing parents, with a phased return-to-work option. This is independent of country of residence and is one of the lab-wide compensation policies that is not negotiable.
  • Sabbatical — every contributor accrues sabbatical time from year one; full-time contributors at the 36-month tenure mark are eligible for a 12-week paid sabbatical, with a longer 24-week option at 72 months. Sabbaticals are taken; we monitor.
  • Compute allowance — every full-time research role carries an associated research-dividend H100-hour quarterly allowance, allocated through the federated scheduler under the standard mechanism. The allowance is in addition to your cell's pooled compute and is yours to direct.
  • Conference and travel budget — USD 12,000-18,000 per year (role-dependent) for attendance at NeurIPS / ICLR / ICML / AAMAS / CAV / POPL / AISI workshops, plus an additional invited-talks allowance covered independently.
  • Sustainable workload — explicitly anti-burnout — the lab tracks contributor working hours (self-reported, monthly) and treats sustained overage as a structural problem to be addressed at the cell-steward or anchor-steward level rather than a virtue to be celebrated. We have data on which cells push their members hardest and we act on it. Vote-of-record contributors have the standing to flag burnout patterns in their own or adjacent cells.
  • Equipment — laptop of your choice, an ergonomic workstation, and a contributor allowance for the secondary equipment you'll inevitably need.

How to apply

Send your application to apply@alphabell.com. We respond to every application; we do not use applicant-tracking automation that ignores submissions.

Required:

  • A CV or résumé covering your research and engineering trajectory, including specific roles and the substantive technical work in each.
  • Two short pieces of public writing in the style of an Alignment Forum post or an alphabell internal-index report (800-1,500 words each). These should be on AI safety topics; we are looking for evidence of substantive thinking under conditions where the answer is not yet clear. If you have prior public writing that fits, send links instead of new writing.
  • A 1-page research proposal scoped to the role you are applying for, naming the cell(s) you would want to work with and the specific question you would want to take on in the first 12 months. Proposals do not need to be definitive — we revise these together once we start talking — but they need to be concrete.

For PhD and postdoctoral roles, also include two references; we will reach out to them after the second interview round.

We typically respond within four weeks. Our interview process — for non-PhD roles — has three stages: a written exchange about the proposal, a 90-minute substantive conversation with the host cell, and a final meeting with the relevant axis steward. We do not whiteboard-code; we do not do leetcode-style interviews; we do not run timed take-home tests.

Equal opportunity

Equal opportunity statement. alphabell is committed to a contributor pool that is broad in background, country of residence, training history, and structural perspective. We do not discriminate on the basis of race, colour, national or ethnic origin, religion, sex, gender identity, sexual orientation, age, disability, veteran status, marital status, pregnancy or parenthood status, or any other protected characteristic under applicable law. We actively seek applicants from communities under-represented in AI safety research, and we treat the structural commitments described in our research conduct charter — including those bearing on dual-use research and paired interpretability — as applying equally to every contributor regardless of background. The lab's anchor stewards and the long-tenured contributor pool review hiring and onboarding patterns quarterly; substantive disparities are treated as governance items rather than as HR statistics. If you require accommodation in the application or interview process, please indicate this in your initial email; the request will be handled by the anchor steward responsible for the role rather than by an outsourced HR function.

One more thing

If you are not sure whether your background fits this role, write to us anyway. Several of our strongest contributors held career paths that no template would have predicted; the lab is structurally biased toward considering applications it does not initially expect to receive. apply@alphabell.com.


Other open roles

← All open roles