---
language:
  - it
tags:
  - dependency-parsing
  - combo
  - universal-dependencies
datasets:
  - universal_dependencies
model-name: Combo Nlp Xlm Roberta Base Italian Isdt Ud2.17
pipeline_tag: token-classification
---

# COMBO-NLP Model for Italian

## Model Description

This is a Italian-language model based on [COMBO-NLP](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp), an open-source natural language preprocessing system. It performs:

- sentence segmentation (via [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo))
- tokenisation (via [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo))
- part-of-speech tagging
- morphological analysis
- lemmatisation
- dependency parsing

The Italian model uses ``FacebookAI/xlm-roberta-base`` as its base encoder and is trained on [UD_Italian-ISDT](https://github.com/UniversalDependencies/UD_Italian-ISDT) (UD v2.17).

## Evaluation

Evaluation was performed on the UD_Italian-ISDT test split using the standard [CoNLL 2018 eval script](https://universaldependencies.org/conll18/conll18_ud_eval.py).

Two evaluation rows are reported:
- **Full-text (F1)**: raw text is segmented by [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo), then parsed and compared against gold — measures end-to-end pipeline performance including segmentation quality.
- **Aligned accuracy**: accuracy on correctly segmented (aligned) tokens — measures parsing quality on tokens that were correctly identified by the segmenter.

### Morphosyntactic Tagging

| Metric | Tokens | Sentences | Words | UPOS | XPOS | UFeats | AllTags | Lemmas |
| ------ | ------ | --------- | ----- | ---- | ---- | ------ | ------- | ------ |
| Full-text (F1) | 99.77 | 99.07 | 99.64 | 98.46 | 98.38 | 97.89 | 97.47 | 98.41 |
| Aligned accuracy | 0.00 | 0.00 | 0.00 | 98.82 | 98.73 | 98.24 | 97.81 | 98.76 |

### Dependency Parsing

| Metric | UAS | LAS | CLAS | MLAS | BLEX |
| ------ | --- | --- | ---- | ---- | ---- |
| Full-text (F1) | 95.02 | 93.74 | 90.89 | 87.63 | 88.96 |
| Aligned accuracy | 95.36 | 94.08 | 91.17 | 87.90 | 89.23 |


## Usage

Install the library from PyPI (assuming you have a virtual environment created):

```bash
pip install combo-nlp
```

Install the Lambo segmenter - only needed when passing raw text strings to COMBO:

```bash
pip install --index-url https://pypi.clarin-pl.eu/ lambo
```

```python
from combo import COMBO

# Load a pre-trained model with corresponding Lambo segmenter
nlp = COMBO("Italian")

# Parse raw text (handles sentence splitting + tokenization)
result = nlp("La veloce volpe marrone salta sopra il cane pigro.")

# Inspect results
for sentence in result:
    for token in sentence:
        print(f"{token.form:<15} {token.lemma:<15} {token.upos:<8} head={token.head}  {token.deprel}")
```

Refer to the COMBO-NLP documentation for installation and usage instructions:

- [https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp)
- [https://gitlab.clarin-pl.eu/syntactic-tools/lambo](https://gitlab.clarin-pl.eu/syntactic-tools/lambo)


## Citation

If you use this model, please cite:

Ulewicz, M., Jabłońska, M., Klimaszewski, M., Przybyła, P., Pszenny, Ł., Rybak, P., Wiącek, M., & Wróblewska, A. (2026). *COMBO-NLP Models Trained on UD v2.17*. Zenodo. https://doi.org/10.5281/zenodo.19650523

```bibtex
@software{combo_nlp_2026,
  author    = {Ulewicz, Michał and Jabłońska, Maja and Klimaszewski, Mateusz and Przybyła, Piotr and Pszenny, Łukasz and Rybak, Piotr and Wiącek, Martyna and Wróblewska, Alina},
  title     = {{COMBO-NLP} Models Trained on {UD} v2.17},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19650523},
  url       = {https://doi.org/10.5281/zenodo.19650523}
}
```


## Resources

- COMBO-NLP: [https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp](https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp)
- LAMBO: [https://gitlab.clarin-pl.eu/syntactic-tools/lambo](https://gitlab.clarin-pl.eu/syntactic-tools/lambo)
- UD_Italian-ISDT: [https://github.com/UniversalDependencies/UD_Italian-ISDT](https://github.com/UniversalDependencies/UD_Italian-ISDT)