---
license: apache-2.0
language:
  - ca
  - es
  - en
base_model: gplsi/Aitana-2B-S-Instruct
tags:
  - valencian
  - spanish
  - english
  - text-generation
  - instruct
  - dpo
  - alignment
  - alia
  - gplsi
datasets:
  - nvidia/HelpSteer3
  - OpenAssistant/oasst1
  - OpenAssistant/oasst2
  - Open-Orca/OpenOrca
library_name: transformers
pipeline_tag: text-generation
---
# Aitana-2B-S-Instruct-Aligned

**Aitana-2B-S-Instruct-Aligned** is a DPO-aligned instruction-tuned generative language model from the **Aitana family**, developed by the [GPLSI (Language and Information Systems Group)](https://gplsi.dlsi.ua.es/) at the University of Alicante. Built on [gplsi/Aitana-2B-S-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct), this model has been further aligned using Direct Preference Optimization (DPO) to improve response quality and alignment with human preferences across Valencian, Spanish, and English.

## Table of Contents

- [Model Description](#model-description)
- [Alignment Details](#alignment-details)
- [Training Data](#training-data)
- [Intended Uses](#intended-uses)
- [How to Use](#how-to-use)
- [Evaluation](#evaluation)
- [Additional Information](#additional-information)

## Model Description

| Property | Value |
|----------|-------|
| **Base Model** | [gplsi/Aitana-2B-S-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct) |
| **Architecture** | Transformer decoder-only |
| **Parameters** | ~2.25B |
| **Languages** | Valencian, Spanish, English |
| **License** | Apache 2.0 |

Aitana-2B-S-Instruct-Aligned extends the Aitana-2B-S-Instruct instruction-tuned model with Direct Preference Optimization (DPO) alignment. This additional training stage improves the model's ability to generate helpful, high-quality responses that better align with human preferences while maintaining its strong multilingual capabilities.

## Alignment Details

The model was aligned using Direct Preference Optimization (DPO) with the following configuration:

| Hyperparameter | Value |
|----------------|-------|
| **Method** | DPO (Direct Preference Optimization) |
| **Learning rate** | 5e-6 |
| **Epochs** | 1 |
| **Beta** | 0.1 |
| **LR Scheduler** | Linear |
| **Total Samples** | 146,180 |
| **English Samples** | 80,308 |
| **Spanish Samples** | 30,072 |
| **Valencian Samples** | 35,800 |
| **Languages** | Spanish, Valencian, English |

The DPO alignment was performed using curated preference pairs that teach the model to prefer more helpful, accurate, and well-structured responses.

## Training Data

The base instruction model was trained on the ALIA Instruction/v12 dataset. This DPO-aligned variant was further aligned using the Alignment/v8 dataset, composed of the following preference data:

| Dataset ID | Name | Languages | Source |
|------------|------|-----------|--------|
| al1 | HelpSteer3 | EN, ES | [nvidia/HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3) |
| al2 | OpenAssistant1 (OASST1) | EN, ES, RU (+32 more) | [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) |
| al3 | OpenAssistant2 (OASST2) | EN, ES, RU (+32 more) | [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) |
| al4 | OpenOrca | EN | [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) |
| al5 | OASST2 Valenciano | VA | — |

The alignment data focused on English, Spanish, and Valencian preference pairs, with the distribution: 80,308 English, 30,072 Spanish, and 35,800 Valencian samples.

## Intended Uses

This model can be used for:

- **Instruction following** in Valencian, Spanish, and English with improved alignment to human preferences
- **Chat and conversational applications** requiring high-quality multilingual responses
- **Text generation** with task-specific prompting and improved output quality
- **Domain-specific applications** in administrative, legal, or tourism contexts

> **Note**: As an aligned instruction-tuned model, it is designed to follow user prompts and generate helpful, safe responses. It is not intended for use as a factual knowledge base. The DPO alignment improves response quality and preference alignment.

## How to Use

### Transformers

```python
import torch
from transformers import pipeline, AutoTokenizer

model_id = "gplsi/Aitana-2B-S-Instruct-Aligned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
generator = pipeline(
    "text-generation",
    model=model_id,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
# Valencian example
text = "Explica què són les Corts Valencianes i quina funció tenen."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
# Spanish example
text = "Describe las principales funciones del gobierno autonómico valenciano."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
# English example
text = "Explain the role of tourism in the Valencian Community economy."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
```

## Evaluation

In the following tables, we present the results obtained with different benchmarks from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) in comparison with [Salamandra-2B-Instruct](https://huggingface.co/BSC-LT/Salamandra-2B-Instruct). The results reflect the DPO-aligned instruction-tuned performance.

### Valencian

#### Classification Benchmarks

| Dataset                      |  Lang. |          Task              | Metric      | Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|
| XNLI                         |   va   |Natural Language Inference  |     acc     |   **0.520**   |         0.514         |

#### Generation Benchmarks

| Dataset                      |  Lang. |          Task              | Metric      | Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|
| Cocoteros                    |   va   |Reading Comprehension       |     bleu    |   2.796       |         **3.612**     |
| Phrases ca-va                |  va-ca |Translation - Adaptation    |     bleu    |   58.425      |         **74.538**    |
| Phrases va-ca                |  va-ca |Translation - Adaptation    |     bleu    |   70.660      |         **71.691**    |
| Phrases va-es                |  va-es |Translation                 |     bleu    |   65.427      |         **72.097**    |
| Phrases es-va                |  es-va |Translation                 |     bleu    |   45.688      |         **56.012**    |
| Truthfulqa_va                |  va |  Truthfulness                 |     bleu_acc    |   **0.409**   |         0.394     |

### Catalan

#### Classification Benchmarks

| Dataset                      |  Lang. |          Task             | Metric      | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------------|-------------|---------------|-----------------------|
| Belebele Cat_latn            |   ca   | Reading Comprehension     |     acc     |   **0.287**   |       0.248       |
| COPA                         |   ca   | Commonsense Reasoning     |     acc     |   0.708       |       **0.726** |
| XStoryCloze                  |   ca   | Commonsense Reasoning     |     acc     |   0.616       |       **0.629** |
| OpenBookQA                   |   ca   | Question Answering        |     acc     |   **0.296**   |       **0.296** |
| PAWS                         |   ca   | Paraphrasing              |     acc     |   **0.602**   |       0.598       |
| PiQA                         |   ca   | Question Answering        |     acc     |   0.638       |       **0.655** |
| ARC Easy                     |   ca   | Question Answering        |     acc     |   0.516       |       **0.524** |
| ARC Challenge                |   ca   | Question Answering        |     acc     |   0.298       |       **0.314** |
| XNLI                         |   ca   | Natural Language Inference|     acc     |   0.513       |       **0.515** |
| Teca                         |   ca   | Natural Language Inference|     acc     |   0.486       |       **0.500** |
| WNLI                         |   ca   | Natural Language Inference|     acc     |   **0.563**   |       0.437       |
| Catcola                      |   ca   | Linguistic Acceptability  |     acc     |   0.492       |       **0.713** |
| Catcola                      |   ca   | Linguistic Acceptability  |     mcc     |   **0.097**   |       -0.040       |
| Catalanqa                    |   ca   | Question Answering        |      F1     |   **0.516**   |       0.384       |
| Mgsm direct                  |   ca   | Math                      | exact match |   0.000       |       **0.012** |
| Catalanqa                    |   ca   | Question Answering        | exact match |   **0.182**   |       0.011       |
| Xquad                        |   ca   | Question Answering        | exact match |   **0.103**   |       0.014       |
| Xquad                        |   ca   | Question Answering        |      F1     |   **0.394**   |       0.287       |

#### Generation Benchmarks

| Dataset                      |  Lang. |          Task            | Metric |  Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|--------------------------|--------|----------------|-----------------------|
| Cabreu abstractive           |  ca    | Summarization            |  bleu  |    7.610       |         **7.703**     |
| Cabreu extractive            |  ca    | Summarization            |  bleu  |    **38.002**  |         19.876         |
| Cabreu extreme               |  ca    | Summarization            |  bleu  |    2.733       |         **3.245**     |

### Spanish

#### Classification Benchmarks

| Dataset                      |  Lang. |          Task             |    Metric   | Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------------|-------------|---------------|-----------------------|
| Belebele                     |   es   | Reading Comprehension     |     acc     | **0.268** | 0.244 |
| PAWS                         |   es   | Paraphrasing              |     acc     | 0.566 | **0.618** |
| XNLI                         |   es   | Natural Language Inference|     acc     | **0.463** | 0.439 |
| WNLI                         |   es   | Natural Language Inference|     acc     | 0.479 | **0.535** |
| XStoryCloze                  |   es   | Commonsense Reasoning     |     acc     | 0.617 | **0.628** |
| Escola                       |   es   | Linguistic Acceptability  |     acc     | 0.293 | **0.708** |
| Escola                       |   es   | Linguistic Acceptability  |     mcc     | **0.020** | 0.000 |
| OpenbookQA                   |   es   | Question Answering        |     acc     | 0.286 | **0.338** |
| MGSM Direct                  |   es   | Math                      | exact match | 0.020 | **0.024** |
| XQUAD                        |   es   | Question Answering        | exact match | **0.066** | 0.026 |
| XQUAD                        |   es   | Question Answering        |      F1     | **0.355** | 0.293 |

#### Generation Benchmarks

| Dataset                      |  Lang. |      Task           | Metric  |  Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------|---------|----------------|-----------------------|
| Cocoteros                    |   es   |Reading Comprehension|  bleu   | **3.308** | 3.141 |
| XLSum                        |   es   | Summarization       |  bleu   | 1.695 | **1.737** |

### English

#### Classification Benchmarks

| Dataset                      |  Lang. |          Task              | Metric      | Salamandra-2B-Instruct |  Aitana-2B-S-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|
| Arc Challenge                |   en   | Question Answering         |     acc     | 0.354 | **0.363** |
| Arc Easy                     |   en   | Question Answering         |     acc     | 0.681 | **0.709** |
| Belebele                     |   en   |   Reading Comprehension    |     acc     | 0.260 | **0.293** |
| PAWS                         |   en   |      Paraphrasing          |     acc     | **0.597** | 0.594 |
| XNLI                         |   en   | Natural Language Inference |     acc     | 0.512 | **0.553** |
| XStoryCloze                  |   en   |   Commonsense Reasoning    |     acc     | 0.662 | **0.680** |
| OpenBookQA                   |   en   |    Question Answering      |     acc     | 0.298 | **0.338** |
| PiQA                         |   en   |    Question Answering      |     acc     | 0.715 | **0.717** |
| Social iqa                   |   en   |    Question Answering      |     acc     | **0.453** | 0.451 |
| WNLI                         |   en   | Natural Language Inference |     acc     | **0.535** | 0.465 |
| MGSM Direct                  |   en   |           Math             | exact match | 0.008 | **0.052** |
| TriviaQA                     |   en   |     Question Answering     | exact match | 0.076 | **0.147** |

### Judge Evaluation

The model was also evaluated using an LLM-as-judge approach across different task categories. The scores below represent the average rating (1-5 scale, 5 being best) and standard deviation for each task category, comparing Aitana-2B-S-Instruct-Aligned-v0.1 against Salamandra-2B-Instruct.

| Task Category | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) |
|---------------|------------------------|-----------------------------------|
| CommonSense reasoning | 2.277 / 1.151 | **2.737 / 1.140** |
| Maths | 1.060 / 0.124 | **1.123 / 0.249** |
| Paraphrasing | **3.518 / 1.308** | 3.460 / 1.088 |
| Reading comprehension | **2.966 / 1.111** | 2.894 / 1.311 |
| Summarization | 2.217 / 1.068 | **2.261 / 0.820** |
| Translation | **3.557 / 0.760** | 3.418 / 0.999 |
| **Overall Avg** | 2.599 / 0.920 | **2.649 / 0.935** |

The DPO-aligned model shows a notable improvement in overall average score (2.649 vs 2.599) compared to Salamandra-2B-Instruct, with particular gains in CommonSense reasoning and Maths. The aligned model also shows tighter standard deviations in several categories, indicating more consistent quality responses.

## Additional Information

### Author

The model has been developed by the **Language and [Information Systems Group (GPLSI)](https://gplsi.dlsi.ua.es/)** and the **[Centro de Inteligencia Digital (CENID)](https://cenid.es)**, both part of the **[University of Alicante (UA)](https://www.ua.es/es/)**, as part of their ongoing research in **Natural Language Processing (NLP)**.


### Funding

This work is funded by the **Ministerio para la Transformación Digital y de la Función Pública**, co-financed by the **EU – NextGenerationEU**, within the framework of the project *Desarrollo de Modelos ALIA*. This work has also been partially supported by Project HEART-NLP (PID2024-156263OB-C22).

### Acknowledgments

We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work.
Special thanks to:
- [Language Technologies Laboratory at Barcelona Supercomputing Center](https://www.bsc.es/es/discover-bsc/organisation/research-structure/language-technologies-laboratory)
- [Centro Vasco de Tecnología de la Lengua (HiTZ)](https://www.hitz.eus/es)
- [Centro Singular de Investigación en Tecnologías Inteligentes (CiTIUS)](https://citius.gal/)
- [Sistemas Inteligentes de Acceso a la Información (SINAI)](https://www.ujaen.es/investigacion-y-transferencia/grupos-de-investigacion/sistemas-inteligentes-de-acceso-la-informacion-sinai)
- [Instituto Universitario de Investigación Informática (IUII)](https://web.ua.es/es/iuii/)
- [Leonardo HPC System](https://leonardo-supercomputer.cineca.eu/)
- [European supercomputing ecosystem (EUROHPC)](https://www.eurohpc-ju.europa.eu/)

We also acknowledge the financial, technical, and scientific support of the **Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA**, whose contribution has been essential to the completion of this research.

### License

[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Disclaimer

This model is intended for general purposes and is available under a permissive Apache License 2.0. Be aware that the model may have biases and/or undesirable outputs. Users deploying systems based on this model are responsible for mitigating risks and complying with applicable AI regulations.

### Reference

```bibtex
@misc{gplsi-aitana-2B-S-Instruct-Aligned,
  author       = {Galiano, Santiago and Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Grande, Eduardo and Estevanell-Valladares, Ernesto L. and Consuegra-Ayala, Juan Pablo and Miró Maestre, María and Canal-Esteve, Miquel and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena, Rafael and Palomar, Manuel},
  title        = {Aitana 2B Instruct Aligned: DPO-aligned instruction-tuned model for Valencian, Spanish and English},
  year         = {2026},
  institution  = {Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA)},
  howpublished = {\url{https://huggingface.co/gplsi/Aitana-2B-S-Instruct-Aligned}},
  note         = {Accessed: 2026-05-11}
}
```

---

**Copyright © 2026 Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA). Distributed under the Apache License 2.0.**