--- library_name: transformers tags: [] --- This is a tokenizer from the Turkish tokenizer collection of research work [Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay](https://arxiv.org/abs/2602.06942). The collection [Turkish Subwords Research](https://huggingface.co/collections/turkish-nlp-suite/turkish-subwords-research) contains tokenizers and this tokenizer read as `2K vocabulary - cased and trained on minimal sized corpus`. Corpora sizes comes in 3, Minimal, Medium and Alldata. The collection contains all the tokenizers of the name `wordpiece_{voxab-size}k_{corpus size}`. For more information, plrease refer to the research paper.