---
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
- fr
- de
- it
- es
library_name: gliner
pipeline_tag: token-classification
tags:
- gliner
- ner
- named-entity-recognition
- zero-shot
- token-classification
- pii
- privacy
- biomedical
- multilingual
- lfm2.5
- bidirectional
- sauerkrautlm
- vago-solutions
---

<!-- Banner: bitte euren Standard-Header (vago-solutions.ai/wp-content/uploads/...) einsetzen,
     analog zu den anderen SauerkrautLM-Cards. -->
[![SauerkrautLM-LFM2.5-GLiNER](https://www.vago-solutions.ai/webp/203_2000_sauerkraut-bild.webp "SauerkrautLM-LFM2.5-GLiNER")](https://vago-solutions.ai)

## VAGO solutions SauerkrautLM-LFM2.5-GLiNER

**Zero-Shot NER Model** – *Bidirectional GLiNER on an LFM2.5-350M backbone — strong multilingual, PII and biomedical entity extraction*

Introducing **SauerkrautLM-LFM2.5-GLiNER** – our zero-shot Named Entity Recognition model built on the
**LFM2.5-350M** backbone, converted from causal to **bidirectional attention** and fine-tuned
for the GLiNER span–label matching task.

* Zero-shot extraction of **arbitrary entity types** provided as labels at inference — no retraining
* **Multilingual**: English, French, German, Italian, Spanish
* State-of-the-art **PII / privacy** recall across all five languages (avg **79.5** F1)
* Large margin on **biomedical NER** (BioNLP-CG: **54.6** F1)
* Compact **350M** backbone — efficient to deploy

# Table of Contents

1. [Overview of all SauerkrautLM-LFM2.5-GLiNER Models](#all-sauerkrautlm-lfm25-gliner)
2. [Model Details](#model-details)
   * [Architecture](#architecture)
   * [Why Bidirectional Attention?](#why-bidirectional-attention)
   * [Training Procedure](#training-procedure)
3. [Capabilities](#capabilities)
4. [Evaluation](#evaluation)
5. [Usage](#usage)
6. [Disclaimer](#disclaimer)
7. [Contact](#contact)
8. [Collaborations](#collaborations)
9. [Acknowledgement](#acknowledgement)

## All SauerkrautLM-LFM2.5-GLiNER

| Model | HF | ONNX | GGUF |
| --- | --- | --- | --- |
| SauerkrautLM-LFM2.5-GLiNER | [Link](https://huggingface.co/VAGOsolutions) | coming soon | coming soon |

## Model Details

* **Model Name:** SauerkrautLM-LFM2.5-GLiNER
* **Backbone:** [LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) (causal → bidirectional)
* **Task:** Zero-shot Named Entity Recognition (GLiNER span–label matching)
* **Language(s):** English, French, German, Italian, Spanish
* **License:** *lfm1.0*
* **Contact:** [VAGO solutions](https://vago-solutions.ai)

### Architecture

* **Backbone:** LFM2.5-350M, converted from causal to **bidirectional attention** (full
  self-attention + symmetric, center-padded convolutions) so that every token attends to the
  full context in both directions.
* **Head:** standard GLiNER scoring head. Text spans and entity-type labels pass through the
  *same* encoder; entities are predicted via the dot product between span representations and
  label representations in a shared latent space. Because labels are free-form phrases supplied
  at inference, the model performs **open-vocabulary, zero-shot** extraction.

### Why Bidirectional Attention?

The backbone was pretrained causally — each token only sees the tokens before it. NER, however,
is not a left-to-right generation task: deciding *what* a token is and *where* an entity starts
and ends frequently depends on context that appears **after** the token. Converting the encoder
to bidirectional attention (and replacing the causal, left-padded convolutions with symmetric,
center-padded ones) lets every token condition on the full sentence.

Cases where right-hand context is decisive:

* **Type disambiguation** — In *"Apple slipped 3% after the iPhone launch"* vs *"Apple is rich
  in fiber"*, the token *Apple* is an ORG in the first but not an entity in the second. The
  distinguishing evidence (*slipped 3%* / *rich in fiber*) comes later — a causal encoder cannot
  see it when encoding *Apple*.
* **Span boundaries** — In *"Bank of America"*, whether *Bank* opens a multi-token organization
  span only becomes clear from the *of America* that follows.
* **Structured / PII formats** — Emails, phone numbers, IBANs and IDs are defined by their *whole
  surface form*. Recognizing such a span requires seeing the trailing characters (domain, check
  digits, separators), which a left-to-right view truncates — a likely cause of the weak PII
  recall observed with the causal backbone.

This is also why a **masked-language-modeling adaptation stage** precedes task training:
bidirectional MLM is what teaches the converted encoder to actually *use* right-hand context and
to build the surface-form sensitivity that structured-entity recall depends on. The measured
outcome is consistent with this — strong PII results across all five languages and a large
biomedical-NER margin.

### Training Procedure

The model is produced in three sequential stages.

[![SauerkrautLM-LFM2.5-GLiNER Training Pipeline](imgs/training_pipeline.png "SauerkrautLM-LFM2.5-GLiNER Training Pipeline")](imgs/training_pipeline.png)

**Stage 1 — Bidirectional MLM adaptation.**
The causal backbone is converted to bidirectional attention and adapted with a masked-language-
modeling objective. This teaches the model to use both left and right context and builds the
surface-form / format sensitivity that NER (especially structured entities) depends on.
*Output: a dense bidirectional encoder checkpoint.*

* Data: ≈3.8M documents (multilingual), 2 epochs
* Sequence length 512, 15% token masking

**Stage 2 — GLiNER task training.**
The adapted encoder is fine-tuned on the GLiNER NER objective (BCE over span–label scores),
establishing the shared latent space between spans and labels and the zero-shot capability.

* Data: ≈772k annotated examples (multilingual)
* ≈110k distinct entity-type labels — free-form phrases, enabling open-vocabulary zero-shot extraction

**Stage 3 — Refinement on higher-quality data.**
A second fine-tuning pass on a smaller, cleaned, higher-quality set sharpens precision and recall.
This stage delivers the main quality gains over the task-trained model.

* Data: ≈79k high-quality examples (multilingual)
* ≈96k distinct entity-type labels

## Capabilities

* **Zero-shot NER** — extract arbitrary entity types provided as labels at inference time, no retraining required.
* **Multilingual** — English, French, German, Italian, Spanish.
* Strong on **general NER** (CrossNER), **privacy / PII** entities, and **domain** benchmarks (biomedical).

## Evaluation

All models were evaluated under a **single shared benchmark harness** (F1 ×100). Please note that
benchmark results in absolute numbers may differ from other published pipelines; the relative
differences remain consistent.

**Capability Overview — Final Checkpoint**

[![SauerkrautLM-LFM2.5-GLiNER Overview](imgs/benchmark_overview.png "SauerkrautLM-LFM2.5-GLiNER Capability Overview")](imgs/benchmark_overview.png)

| Benchmark | F1 |
| --- | --- |
| CrossNER — English (avg) | 78.4 |
| CrossNER — multilingual (avg) | 72.5 |
| Privacy / PII — multilingual (avg) | 79.5 |
| Biomedical NER (BioNLP-CG) | 54.6 |

**CrossNER — Multilingual Zero-Shot NER**

[![CrossNER](imgs/benchmark_crossner-3.png "CrossNER per language")](imgs/benchmark_crossner-3.png)

| Model | EN | FR | DE | IT | ES | avg |
| --- | --- | --- | --- | --- | --- | --- |
| **SauerkrautLM-LFM2.5-GLiNER (ours)** | **78.4** | **71.4** | **69.0** | **71.2** | **72.4** | **72.5** |
| SauerkrautLM-GLiNER | 73.8 | 71.2 | 68.7 | 71.3 | 72.0 | 71.4 |
| urchade/gliner_large-v2.1 | 71.9 | 57.3 | 55.8 | 58.1 | 58.6 | 60.3 |
| urchade/gliner_multi-v2.1 | 72.2 | 46.7 | 46.8 | 48.1 | 48.9 | 52.5 |

**Privacy / PII — Multilingual Entity Extraction**

[![PII](imgs/benchmark_pii.png "Privacy / PII per language")](imgs/benchmark_pii.png)

| Model | EN | FR | DE | IT | ES | avg |
| --- | --- | --- | --- | --- | --- | --- |
| **SauerkrautLM-LFM2.5-GLiNER (ours)** | **78.7** | **81.8** | **76.5** | **79.4** | **81.4** | **79.5** |
| urchade/gliner_large-v2.1 | 72.0 | 76.1 | 70.3 | 68.9 | 72.2 | 71.9 |
| urchade/gliner_multi-v2.1 | 51.1 | 62.2 | 58.6 | 57.6 | 58.0 | 57.5 |
| SauerkrautLM-GLiNER | 65.8 | 52.9 | 57.8 | 53.6 | 46.2 | 55.2 |

**Biomedical NER — BioNLP-CG (EN)**

[![Biomedical NER](imgs/benchmark_biomedical.png "Biomedical NER BioNLP-CG")](imgs/benchmark_biomedical.png)

| Model | F1 |
| --- | --- |
| **SauerkrautLM-LFM2.5-GLiNER (ours)** | **54.6** |
| SauerkrautLM-GLiNER | 36.3 |
| urchade/gliner_large-v2.1 | 35.5 |
| urchade/gliner_multi-v2.1 | 29.4 |

## Usage

This is a [GLiNER](https://github.com/urchade/GLiNER) model. Install the library and provide the
entity types you want to extract as labels at inference time:

```bash
pip install gliner
```

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER")

text = "Maria Schmidt arbeitet bei Siemens in München, E-Mail: maria.schmidt@siemens.com"

# free-form labels — change them per request, no retraining needed
labels = ["person", "organization", "location", "email"]

entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
    print(f"{entity['text']} => {entity['label']}  ({entity['score']:.2f})")
```


## Disclaimer

We must inform users that despite our best efforts in data cleansing, the possibility of
uncensored or incorrect content slipping through cannot be entirely ruled out. We cannot guarantee
consistently appropriate behavior. Therefore, if you encounter any issues or come across
inappropriate content, we kindly request that you inform us through the contact information
provided. Additionally, it is essential to understand that the licensing of these models does not
constitute legal advice. We are not held responsible for the actions of third parties who utilize
our models.

## Contact

If you are interested in customized LLMs or NER/PII extraction solutions for business
applications, please get in contact with us via our [website](https://vago-solutions.ai). We are
also grateful for your feedback and suggestions.

## Collaborations

We are also keenly seeking support and investment for our startup, VAGO solutions, where we
continuously advance the development of robust language models designed to address a diverse range
of purposes and requirements. If the prospect of collaboratively navigating future challenges
excites you, we warmly invite you to reach out to us at [VAGO solutions](https://vago-solutions.ai).

## Citation

If you use SauerkrautLM-LFM2.5-GLiNER in your research or applications, please cite:

```bibtex
@misc{SauerkrautLM-LFM2.5-GLiNER,
  title={SauerkrautLM-LFM2.5-GLiNER},
  author={Michele Montebovi},
  organization={VAGO Solutions},
  url={https://huggingface.co/VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER},
  year={2026}
}
```
## Acknowledgement

Many thanks to [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2 base model, to
[urchade](https://huggingface.co/urchade) for the GLiNER framework, and to our community for their
continued support and engagement.