Tokenizer Bias in Agentic Decision-Making
Iben Lykke, Mira Holloway, Cheung Wai-Lin
@inproceedings{lykke2025tokenizer,
title = {Tokenizer Bias in Agentic Decision-Making},
author = {Lykke, Iben and Holloway, Mira and Wai-Lin, Cheung},
year = {2025},
booktitle = {ACL 2025 · alphabell index 25/13},
month = {jul},
doi = {10.48550/arXiv.2507.10402},
url = {https://dev.alphabell.com/publications/tokenizer-bias-agentic-decisions}
}
Abstract
The tokenizer used by a substrate-hosted agent's underlying language model is not a neutral preprocessing step — it systematically biases which tool calls the agent prefers, which observations the agent treats as similar, and which plans the agent finds tractable to express. We characterise the bias on five widely used tokenizers, show that it survives across model scales, and propose a compositional vocabulary layer at the substrate boundary that partially mitigates it without retraining the underlying model.
Index metadata
- Cell
- ramanujan-07
- Compute
- 37 H100-days
- Status
- Open release
- Code
- github.com/alphabell-labs/ab-tokens
- DOI
- 10.48550/arXiv.2507.10402
- arXiv
- arXiv:2507.10402
What this paper is part of
This index entry is part of the Agentic engineering research axis. The producing cell — ramanujan-07 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.
How to read this
If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-tokens; the dataset is at https://huggingface.co/datasets/alphabell/tokenbias-2025 when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug tokenizer-bias-agentic-decisions.
Limitations
Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.
Iben Lykke, Mira Holloway, Cheung Wai-Lin. Tokenizer Bias in Agentic Decision-Making. ACL 2025 · alphabell index 25/13, Jul 2025. arXiv:2507.10402. doi:10.48550/arXiv.2507.10402.