Research
Liora is the RSI axis steward and the principal author of the modification-under-review (MUR) protocol that governs every recursive-self-improvement run at alphabell. Her work centres on what a stopping condition has to do — what behaviour, what trace property, what eval metric should function as a tripwire — and on the operational question of how to design pre-registrations that the run can't quietly route around.
She holds the unusual distinction of having co-authored every public delayed-release report on an RSI run halt that the lab has produced. The 22-04, 24-13, and 25-06 reports all carry her name. She insists this is a consequence of being the axis steward at the time, not a personal record.
Liora moved to Tel Aviv in 2022 to be closer to the regional academic-cluster collaborators. She speaks at length, when asked, about the difference between 'safety-aware capability work' and 'capability work with safety appendices'; the difference, in her view, is whether the stopping condition can actually halt the run.
Background
Ph.D. logic + computer science, Hebrew University of Jerusalem, 2013. Postdoc at Oxford (FHI logic group), 2014-2016.
Prior to alphabell: Oxford FHI; Cantor Initiative; Helios Safety Group.
Selected publications
-
Jun 2025 · ab-recursive-modiModification-Under-Review: protocols for safe self-modification of training proceduresLiora Sabatini, Yuki Cho, Aravind Periyasamy
-
Nov 2024 · ab-sandboxed-selfSandboxed Self-Modification: a confinement specification and implementationLiora Sabatini, Cheung Wai-Lin, Marek Holub
-
Sep 2024 · ab-interpretabiliInterpretability Cell Pairing: how every dual-use capability run gets a watchful siblingKarima Belkadi, Hester Vandekerckhove, Yuki Cho
-
May 2025 · ab-mechanistic-ciMechanistic Circuit Analysis at Frontier Scale: cells as a unit of interpretabilityJiang Yifei, Nico Almgren, Karima Belkadi, Hester Vandekerckhove
-
Sep 2025 · ab-scalable-oversScalable Oversight for Multi-Step Agent Systems: a Debate-Plus-Trace ApproachIfeoma Nwosu-Howard, Hiroshi Tanigawa, Maral Lotfi, Ruth Wernicke
Recent talks
- Modification-under-review — three halts, ML Safety Workshop, NeurIPS 2024
- What a stopping condition has to do, AISI public series 2025
- Pre-registration as governance, EA Global SF 2024
Liora is currently part of node-cell godel-02, working under the Recursive self-improvement research axis. The cell is open to substantive correspondence from researchers working on adjacent problems; route requests through godel-02@alphabell.com or directly to Liora at liora-sabatini@alphabell.com.
Contact
- EMAIL
liora-sabatini@alphabell.com - ORCID
0000-7520-4629-9852 - X
@liorasabatini - BLUESKY
liora-sabatini.bsky.social - GITHUB
@liorasabatini
Cross-references