--- license: other license_name: lfm1.0 license_link: LICENSE language: - en - fr - de - it - es library_name: gliner pipeline_tag: token-classification tags: - gliner - ner - named-entity-recognition - zero-shot - token-classification - pii - privacy - biomedical - multilingual - lfm2.5 - bidirectional - sauerkrautlm - vago-solutions --- [![SauerkrautLM-LFM2.5-GLiNER](https://www.vago-solutions.ai/webp/203_2000_sauerkraut-bild.webp "SauerkrautLM-LFM2.5-GLiNER")](https://vago-solutions.ai) ## VAGO solutions SauerkrautLM-LFM2.5-GLiNER **Zero-Shot NER Model** – *Bidirectional GLiNER on an LFM2.5-350M backbone — strong multilingual, PII and biomedical entity extraction* Introducing **SauerkrautLM-LFM2.5-GLiNER** – our zero-shot Named Entity Recognition model built on the **LFM2.5-350M** backbone, converted from causal to **bidirectional attention** and fine-tuned for the GLiNER span–label matching task. * Zero-shot extraction of **arbitrary entity types** provided as labels at inference — no retraining * **Multilingual**: English, French, German, Italian, Spanish * State-of-the-art **PII / privacy** recall across all five languages (avg **79.5** F1) * Large margin on **biomedical NER** (BioNLP-CG: **54.6** F1) * Compact **350M** backbone — efficient to deploy # Table of Contents 1. [Overview of all SauerkrautLM-LFM2.5-GLiNER Models](#all-sauerkrautlm-lfm25-gliner) 2. [Model Details](#model-details) * [Architecture](#architecture) * [Why Bidirectional Attention?](#why-bidirectional-attention) * [Training Procedure](#training-procedure) 3. [Capabilities](#capabilities) 4. [Evaluation](#evaluation) 5. [Usage](#usage) 6. [Disclaimer](#disclaimer) 7. [Contact](#contact) 8. [Collaborations](#collaborations) 9. [Acknowledgement](#acknowledgement) ## All SauerkrautLM-LFM2.5-GLiNER | Model | HF | ONNX | GGUF | | --- | --- | --- | --- | | SauerkrautLM-LFM2.5-GLiNER | [Link](https://huggingface.co/VAGOsolutions) | coming soon | coming soon | ## Model Details * **Model Name:** SauerkrautLM-LFM2.5-GLiNER * **Backbone:** [LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) (causal → bidirectional) * **Task:** Zero-shot Named Entity Recognition (GLiNER span–label matching) * **Language(s):** English, French, German, Italian, Spanish * **License:** *lfm1.0* * **Contact:** [VAGO solutions](https://vago-solutions.ai) ### Architecture * **Backbone:** LFM2.5-350M, converted from causal to **bidirectional attention** (full self-attention + symmetric, center-padded convolutions) so that every token attends to the full context in both directions. * **Head:** standard GLiNER scoring head. Text spans and entity-type labels pass through the *same* encoder; entities are predicted via the dot product between span representations and label representations in a shared latent space. Because labels are free-form phrases supplied at inference, the model performs **open-vocabulary, zero-shot** extraction. ### Why Bidirectional Attention? The backbone was pretrained causally — each token only sees the tokens before it. NER, however, is not a left-to-right generation task: deciding *what* a token is and *where* an entity starts and ends frequently depends on context that appears **after** the token. Converting the encoder to bidirectional attention (and replacing the causal, left-padded convolutions with symmetric, center-padded ones) lets every token condition on the full sentence. Cases where right-hand context is decisive: * **Type disambiguation** — In *"Apple slipped 3% after the iPhone launch"* vs *"Apple is rich in fiber"*, the token *Apple* is an ORG in the first but not an entity in the second. The distinguishing evidence (*slipped 3%* / *rich in fiber*) comes later — a causal encoder cannot see it when encoding *Apple*. * **Span boundaries** — In *"Bank of America"*, whether *Bank* opens a multi-token organization span only becomes clear from the *of America* that follows. * **Structured / PII formats** — Emails, phone numbers, IBANs and IDs are defined by their *whole surface form*. Recognizing such a span requires seeing the trailing characters (domain, check digits, separators), which a left-to-right view truncates — a likely cause of the weak PII recall observed with the causal backbone. This is also why a **masked-language-modeling adaptation stage** precedes task training: bidirectional MLM is what teaches the converted encoder to actually *use* right-hand context and to build the surface-form sensitivity that structured-entity recall depends on. The measured outcome is consistent with this — strong PII results across all five languages and a large biomedical-NER margin. ### Training Procedure The model is produced in three sequential stages. [![SauerkrautLM-LFM2.5-GLiNER Training Pipeline](imgs/training_pipeline.png "SauerkrautLM-LFM2.5-GLiNER Training Pipeline")](imgs/training_pipeline.png) **Stage 1 — Bidirectional MLM adaptation.** The causal backbone is converted to bidirectional attention and adapted with a masked-language- modeling objective. This teaches the model to use both left and right context and builds the surface-form / format sensitivity that NER (especially structured entities) depends on. *Output: a dense bidirectional encoder checkpoint.* * Data: ≈3.8M documents (multilingual), 2 epochs * Sequence length 512, 15% token masking **Stage 2 — GLiNER task training.** The adapted encoder is fine-tuned on the GLiNER NER objective (BCE over span–label scores), establishing the shared latent space between spans and labels and the zero-shot capability. * Data: ≈772k annotated examples (multilingual) * ≈110k distinct entity-type labels — free-form phrases, enabling open-vocabulary zero-shot extraction **Stage 3 — Refinement on higher-quality data.** A second fine-tuning pass on a smaller, cleaned, higher-quality set sharpens precision and recall. This stage delivers the main quality gains over the task-trained model. * Data: ≈79k high-quality examples (multilingual) * ≈96k distinct entity-type labels ## Capabilities * **Zero-shot NER** — extract arbitrary entity types provided as labels at inference time, no retraining required. * **Multilingual** — English, French, German, Italian, Spanish. * Strong on **general NER** (CrossNER), **privacy / PII** entities, and **domain** benchmarks (biomedical). ## Evaluation All models were evaluated under a **single shared benchmark harness** (F1 ×100). Please note that benchmark results in absolute numbers may differ from other published pipelines; the relative differences remain consistent. **Capability Overview — Final Checkpoint** [![SauerkrautLM-LFM2.5-GLiNER Overview](imgs/benchmark_overview.png "SauerkrautLM-LFM2.5-GLiNER Capability Overview")](imgs/benchmark_overview.png) | Benchmark | F1 | | --- | --- | | CrossNER — English (avg) | 78.4 | | CrossNER — multilingual (avg) | 72.5 | | Privacy / PII — multilingual (avg) | 79.5 | | Biomedical NER (BioNLP-CG) | 54.6 | **CrossNER — Multilingual Zero-Shot NER** [![CrossNER](imgs/benchmark_crossner-3.png "CrossNER per language")](imgs/benchmark_crossner-3.png) | Model | EN | FR | DE | IT | ES | avg | | --- | --- | --- | --- | --- | --- | --- | | **SauerkrautLM-LFM2.5-GLiNER (ours)** | **78.4** | **71.4** | **69.0** | **71.2** | **72.4** | **72.5** | | SauerkrautLM-GLiNER | 73.8 | 71.2 | 68.7 | 71.3 | 72.0 | 71.4 | | urchade/gliner_large-v2.1 | 71.9 | 57.3 | 55.8 | 58.1 | 58.6 | 60.3 | | urchade/gliner_multi-v2.1 | 72.2 | 46.7 | 46.8 | 48.1 | 48.9 | 52.5 | **Privacy / PII — Multilingual Entity Extraction** [![PII](imgs/benchmark_pii.png "Privacy / PII per language")](imgs/benchmark_pii.png) | Model | EN | FR | DE | IT | ES | avg | | --- | --- | --- | --- | --- | --- | --- | | **SauerkrautLM-LFM2.5-GLiNER (ours)** | **78.7** | **81.8** | **76.5** | **79.4** | **81.4** | **79.5** | | urchade/gliner_large-v2.1 | 72.0 | 76.1 | 70.3 | 68.9 | 72.2 | 71.9 | | urchade/gliner_multi-v2.1 | 51.1 | 62.2 | 58.6 | 57.6 | 58.0 | 57.5 | | SauerkrautLM-GLiNER | 65.8 | 52.9 | 57.8 | 53.6 | 46.2 | 55.2 | **Biomedical NER — BioNLP-CG (EN)** [![Biomedical NER](imgs/benchmark_biomedical.png "Biomedical NER BioNLP-CG")](imgs/benchmark_biomedical.png) | Model | F1 | | --- | --- | | **SauerkrautLM-LFM2.5-GLiNER (ours)** | **54.6** | | SauerkrautLM-GLiNER | 36.3 | | urchade/gliner_large-v2.1 | 35.5 | | urchade/gliner_multi-v2.1 | 29.4 | ## Usage This is a [GLiNER](https://github.com/urchade/GLiNER) model. Install the library and provide the entity types you want to extract as labels at inference time: ```bash pip install gliner ``` ```python from gliner import GLiNER model = GLiNER.from_pretrained("VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER") text = "Maria Schmidt arbeitet bei Siemens in München, E-Mail: maria.schmidt@siemens.com" # free-form labels — change them per request, no retraining needed labels = ["person", "organization", "location", "email"] entities = model.predict_entities(text, labels, threshold=0.5) for entity in entities: print(f"{entity['text']} => {entity['label']} ({entity['score']:.2f})") ``` ## Disclaimer We must inform users that despite our best efforts in data cleansing, the possibility of uncensored or incorrect content slipping through cannot be entirely ruled out. We cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided. Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models. ## Contact If you are interested in customized LLMs or NER/PII extraction solutions for business applications, please get in contact with us via our [website](https://vago-solutions.ai). We are also grateful for your feedback and suggestions. ## Collaborations We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us at [VAGO solutions](https://vago-solutions.ai). ## Citation If you use SauerkrautLM-LFM2.5-GLiNER in your research or applications, please cite: ```bibtex @misc{SauerkrautLM-LFM2.5-GLiNER, title={SauerkrautLM-LFM2.5-GLiNER}, author={Michele Montebovi}, organization={VAGO Solutions}, url={https://huggingface.co/VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER}, year={2026} } ``` ## Acknowledgement Many thanks to [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2 base model, to [urchade](https://huggingface.co/urchade) for the GLiNER framework, and to our community for their continued support and engagement.