α · Publications · adversarial-robustness-goal-conditioned

Adversarial Robustness of Goal-Conditioned World Models

Sasha Petrov, Jonas Bremer, Tomoko Niwa, Maya Quesada

Axis World models
Cell voronoi-19
Published Dec 2024
Venue NeurIPS 2024 · alphabell index 24/20
Tags world models

Abstract

Goal-conditioned world models are increasingly used as planners — but planners only inherit the model's robustness to inputs the world model itself has not been certified against. We construct an adversarial test suite of perturbed goal embeddings drawn from the same distribution as benign goals, and show that several state-of-the-art world models drop plan-success rates by 47-62% under perturbations that are individually within 0.02 of the L2 ball used during training. We propose a goal-injection penalty that recovers most of the lost performance and discuss the implications for downstream planning.

Index metadata

Cell
voronoi-19
Compute
84 H100-days
Status
Open release
Code
github.com/alphabell-labs/ab-robust-worlds
DOI
10.48550/arXiv.2412.03998
arXiv
arXiv:2412.03998

What this paper is part of

This index entry is part of the World models research axis. The producing cell — voronoi-19 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.

How to read this

If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-robust-worlds; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug adversarial-robustness-goal-conditioned.

Limitations

Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.

Citation

Sasha Petrov, Jonas Bremer, Tomoko Niwa, Maya Quesada. Adversarial Robustness of Goal-Conditioned World Models. NeurIPS 2024 · alphabell index 24/20, Dec 2024. arXiv:2412.03998. doi:10.48550/arXiv.2412.03998.