α · Publications · tokenizer-bias-agentic-decisions

Tokenizer Bias in Agentic Decision-Making

Iben Lykke, Mira Holloway, Cheung Wai-Lin

Axis Agentic engineering

Cell ramanujan-07

Published Jul 2025

Venue ACL 2025 · alphabell index 25/13

Tags agentic

⬇ PDF α arXiv:2507.10402 ⌬ DOI ⌘ Code ▣ Data

BibTeX

@inproceedings{lykke2025tokenizer,
  title        = {Tokenizer Bias in Agentic Decision-Making},
  author       = {Lykke, Iben and Holloway, Mira and Wai-Lin, Cheung},
  year         = {2025},
  booktitle    = {ACL 2025 · alphabell index 25/13},
  month        = {jul},
  doi          = {10.48550/arXiv.2507.10402},
  url          = {https://dev.alphabell.com/publications/tokenizer-bias-agentic-decisions}
}

Abstract

The tokenizer used by a substrate-hosted agent's underlying language model is not a neutral preprocessing step — it systematically biases which tool calls the agent prefers, which observations the agent treats as similar, and which plans the agent finds tractable to express. We characterise the bias on five widely used tokenizers, show that it survives across model scales, and propose a compositional vocabulary layer at the substrate boundary that partially mitigates it without retraining the underlying model.

Index metadata

Cell: ramanujan-07
Compute: 37 H100-days
Status: Open release
Code: github.com/alphabell-labs/ab-tokens
DOI: 10.48550/arXiv.2507.10402
arXiv: arXiv:2507.10402

What this paper is part of

This index entry is part of the Agentic engineering research axis. The producing cell — ramanujan-07 — collaborates with adjacent cells listed in the cell directory. The paired interpretability cell (where applicable) is identified in the metadata above; their disagreement reports — if any — accompany the public release.

How to read this

If you want to use the result: the code (where available) is at https://github.com/alphabell-labs/ab-tokens; the dataset is at https://huggingface.co/datasets/alphabell/tokenbias-2025 when one is released. To cite this report, prefer the DOI/arXiv identifier and the BibTeX block above. To discuss this with the producing cell, contact the lab with the index entry slug tokenizer-bias-agentic-decisions.

Limitations

Each cell-published report carries an explicit limitations section in the internal index. We do not paraphrase it here. Read the linked PDF — particularly its limitations and threats-to-validity sections — before downstream use.

Citation

Iben Lykke, Mira Holloway, Cheung Wai-Lin. Tokenizer Bias in Agentic Decision-Making. ACL 2025 · alphabell index 25/13, Jul 2025. arXiv:2507.10402. doi:10.48550/arXiv.2507.10402.