α · Publications · compositional-generalisation-mixed-modality

Compositional Generalisation in Mixed-Modality World Models

Wen Shao, Søren Almqvist, Tomoko Niwa, Jonas Bremer

Axis World models

Cell hadamard-08

Published Jul 2025

Venue ICML 2025 · alphabell index 25/12b

Tags world models

⬇ PDF α arXiv:2507.05432 ⌬ DOI ⌘ Code ▣ Data

BibTeX

@inproceedings{shao2025compositional,
  title        = {Compositional Generalisation in Mixed-Modality World Models},
  author       = {Shao, Wen and Almqvist, Søren and Niwa, Tomoko and Bremer, Jonas},
  year         = {2025},
  booktitle    = {ICML 2025 · alphabell index 25/12b},
  month        = {jul},
  doi          = {10.48550/arXiv.2507.05432},
  url          = {https://dev.alphabell.com/publications/compositional-generalisation-mixed-modality}
}

Abstract

Mixed-modality world models — those whose latent state must capture vision, language, and symbolic dynamics simultaneously — succeed or fail at compositional generalisation on novel mode-mixings that none of the unimodal models in their substrate ever encountered. We characterise the regimes in which our cross-modal latent unification approach generalises, identify a specific failure mode in language-symbolic mixings that we trace back to a quirk in the unified tokenizer, and propose a substrate-level change that resolves it.

Index metadata

Cell: hadamard-08
Compute: 92 H100-days
Status: Open release
Code: github.com/alphabell-labs/ab-mixed
DOI: 10.48550/arXiv.2507.05432
arXiv: arXiv:2507.05432

What this paper is part of

This index entry is part of the World models research axis. The producing cell — hadamard-08 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.

How to read this

If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-mixed; the dataset is at https://huggingface.co/datasets/alphabell/mixed-modality-2025 when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug compositional-generalisation-mixed-modality.

Limitations

Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.

Citation

Wen Shao, Søren Almqvist, Tomoko Niwa, Jonas Bremer. Compositional Generalisation in Mixed-Modality World Models. ICML 2025 · alphabell index 25/12b, Jul 2025. arXiv:2507.05432. doi:10.48550/arXiv.2507.05432.