Adversarial Robustness of Goal-Conditioned World Models
Sasha Petrov, Jonas Bremer, Tomoko Niwa, Maya Quesada
@inproceedings{petrov2024adversarial,
title = {Adversarial Robustness of Goal-Conditioned World Models},
author = {Petrov, Sasha and Bremer, Jonas and Niwa, Tomoko and Quesada, Maya},
year = {2024},
booktitle = {NeurIPS 2024 · alphabell index 24/20},
month = {dec},
doi = {10.48550/arXiv.2412.03998},
url = {https://dev.alphabell.com/publications/adversarial-robustness-goal-conditioned}
}
Abstract
Goal-conditioned world models are increasingly used as planners — but planners only inherit the model's robustness to inputs the world model itself has not been certified against. We construct an adversarial test suite of perturbed goal embeddings drawn from the same distribution as benign goals, and show that several state-of-the-art world models drop plan-success rates by 47-62% under perturbations that are individually within 0.02 of the L2 ball used during training. We propose a goal-injection penalty that recovers most of the lost performance and discuss the implications for downstream planning.
Index metadata
- Cell
- voronoi-19
- Compute
- 84 H100-days
- Status
- Open release
- Code
- github.com/alphabell-labs/ab-robust-worlds
- DOI
- 10.48550/arXiv.2412.03998
- arXiv
- arXiv:2412.03998
What this paper is part of
This index entry is part of the World models research axis. The producing cell — voronoi-19 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-robust-worlds; the dataset is at TBD when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug adversarial-robustness-goal-conditioned.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Sasha Petrov, Jonas Bremer, Tomoko Niwa, Maya Quesada. Adversarial Robustness of Goal-Conditioned World Models. NeurIPS 2024 · alphabell index 24/20, Dec 2024. arXiv:2412.03998. doi:10.48550/arXiv.2412.03998.