--- license: cc-by-sa-4.0 language: - tr tags: - dependency-parsing - combo - universal-dependencies datasets: - universal_dependencies model-name: Combo Nlp Xlm Roberta Base Turkish Atis Ud2.17 pipeline_tag: token-classification --- # COMBO-NLP Model for Turkish ## Model Description This is a Turkish-language model based on [COMBO-NLP](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp), an open-source natural language preprocessing system. It performs: - sentence segmentation (via [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo)) - tokenisation (via [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo)) - part-of-speech tagging - morphological analysis - lemmatisation - dependency parsing The Turkish model uses ``FacebookAI/xlm-roberta-base`` as its base encoder and is trained on [UD_Turkish-Atis](https://github.com/UniversalDependencies/UD_Turkish-Atis) (UD v2.17). ## Evaluation Evaluation was performed on the UD_Turkish-Atis test split using the standard [CoNLL 2018 eval script](https://universaldependencies.org/conll18/conll18_ud_eval.py). Two evaluation rows are reported: - **Full-text (F1)**: raw text is segmented by [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo), then parsed and compared against gold — measures end-to-end pipeline performance including segmentation quality. - **Aligned accuracy**: accuracy on correctly segmented (aligned) tokens — measures parsing quality on tokens that were correctly identified by the segmenter. ### Morphosyntactic Tagging | Metric | Tokens | Sentences | Words | UPOS | XPOS | UFeats | AllTags | Lemmas | | ------ | ------ | --------- | ----- | ---- | ---- | ------ | ------- | ------ | | Full-text (F1) | 99.88 | 87.32 | 99.88 | 98.24 | 99.88 | 97.59 | 97.43 | 98.80 | | Aligned accuracy | 0.00 | 0.00 | 0.00 | 98.36 | 100.00 | 97.71 | 97.55 | 98.92 | ### Dependency Parsing | Metric | UAS | LAS | CLAS | MLAS | BLEX | | ------ | --- | --- | ---- | ---- | ---- | | Full-text (F1) | 89.00 | 87.34 | 86.86 | 84.42 | 86.04 | | Aligned accuracy | 89.11 | 87.45 | 87.00 | 84.56 | 86.18 | ## Usage Install the library from PyPI (assuming you have a virtual environment created): ```bash pip install combo-nlp ``` Install the Lambo segmenter - only needed when passing raw text strings to COMBO: ```bash pip install --index-url https://pypi.clarin-pl.eu/ lambo ``` ```python from combo import COMBO # Load a pre-trained model with corresponding Lambo segmenter nlp = COMBO("Turkish") # Parse raw text (handles sentence splitting + tokenization) result = nlp("Çevik kahverengi tilki tembel köpeğin üzerinden atlar.") # Inspect results for sentence in result: for token in sentence: print(f"{token.form:<15} {token.lemma:<15} {token.upos:<8} head={token.head} {token.deprel}") ``` Refer to the COMBO-NLP documentation for installation and usage instructions: - [https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp) - [https://gitlab.clarin-pl.eu/syntactic-tools/lambo](https://gitlab.clarin-pl.eu/syntactic-tools/lambo) ## License The training data license: cc-by-sa-4.0 is derived from the Universal Dependencies treebank. For the full license terms of each treebank, please refer to the corresponding `LICENSE.txt` file in the treebank repository: - [UD_Turkish-Atis LICENSE.txt](https://github.com/UniversalDependencies/UD_Turkish-Atis/blob/master/LICENSE.txt) ## Citation If you use this model, please cite: Ulewicz, M., Jabłońska, M., Klimaszewski, M., Przybyła, P., Pszenny, Ł., Rybak, P., Wiącek, M., & Wróblewska, A. (2026). *COMBO-NLP Models Trained on UD v2.17*. Zenodo. https://doi.org/10.5281/zenodo.19650523 ```bibtex @software{combo_nlp_2026, author = {Ulewicz, Michał and Jabłońska, Maja and Klimaszewski, Mateusz and Przybyła, Piotr and Pszenny, Łukasz and Rybak, Piotr and Wiącek, Martyna and Wróblewska, Alina}, title = {{COMBO-NLP} Models Trained on {UD} v2.17}, year = {2026}, publisher = {Zenodo}, doi = {10.5281/zenodo.19650523}, url = {https://doi.org/10.5281/zenodo.19650523} } ``` ## Resources - COMBO-NLP: [https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp) - LAMBO: [https://gitlab.clarin-pl.eu/syntactic-tools/lambo](https://gitlab.clarin-pl.eu/syntactic-tools/lambo) - UD_Turkish-Atis: [https://github.com/UniversalDependencies/UD_Turkish-Atis](https://github.com/UniversalDependencies/UD_Turkish-Atis)