α · People · Liora Sabatini

Liora Sabatini

Long-tenured contributor · RSI gatekeeper

Based in Tel Aviv

Node-cell godel-02

ORCID 0000-7520-4629-9852

Research

Liora is the RSI axis steward and the principal author of the modification-under-review (MUR) protocol that governs every recursive-self-improvement run at alphabell. Her work centres on what a stopping condition has to do — what behaviour, what trace property, what eval metric should function as a tripwire — and on the operational question of how to design pre-registrations that the run can't quietly route around.

She holds the unusual distinction of having co-authored every public delayed-release report on an RSI run halt that the lab has produced. The 22-04, 24-13, and 25-06 reports all carry her name. She insists this is a consequence of being the axis steward at the time, not a personal record.

Liora moved to Tel Aviv in 2022 to be closer to the regional academic-cluster collaborators. She speaks at length, when asked, about the difference between 'safety-aware capability work' and 'capability work with safety appendices'; the difference, in her view, is whether the stopping condition can actually halt the run.

Background

Ph.D. logic + computer science, Hebrew University of Jerusalem, 2013. Postdoc at Oxford (FHI logic group), 2014-2016.

Prior to alphabell: Oxford FHI; Cantor Initiative; Helios Safety Group.

Selected publications

Jun 2025 · ab-recursive-modi
Modification-Under-Review: protocols for safe self-modification of training procedures
Liora Sabatini, Yuki Cho, Aravind Periyasamy
Nov 2024 · ab-sandboxed-self
Sandboxed Self-Modification: a confinement specification and implementation
Liora Sabatini, Cheung Wai-Lin, Marek Holub
Sep 2024 · ab-interpretabili
Interpretability Cell Pairing: how every dual-use capability run gets a watchful sibling
Karima Belkadi, Hester Vandekerckhove, Yuki Cho
May 2025 · ab-mechanistic-ci
Mechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretability
Jiang Yifei, Nico Almgren, Karima Belkadi, Hester Vandekerckhove
Sep 2025 · ab-scalable-overs
Scalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace Approach
Ifeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke

Full publications index →

Recent talks

Modification-under-review — three halts, ML Safety Workshop, NeurIPS 2024
What a stopping condition has to do, AISI public series 2025
Pre-registration as governance, EA Global SF 2024

Working with

Liora is currently part of node-cell godel-02, working under the Recursive self-improvement research axis. The cell is open to substantive correspondence from researchers working on adjacent problems; route requests through godel-02@alphabell.com or directly to Liora at liora-sabatini@alphabell.com.

Contact

Cross-references