Transformers

This is a tokenizer from the Turkish tokenizer collection of research work Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay.

The collection Turkish Subwords Research contains tokenizers and this tokenizer read as 2K vocabulary - cased and trained on minimal sized corpus. Corpora sizes comes in 3, Minimal, Medium and Alldata. The collection contains all the tokenizers of the name wordpiece_{voxab-size}k_{corpus size}. For more information, plrease refer to the research paper.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including turkish-nlp-suite/wordpiece_2k_cased_minimal

Paper for turkish-nlp-suite/wordpiece_2k_cased_minimal