--- license: apache-2.0 language: - ca - es - en base_model: gplsi/Aitana-2B-S-Instruct tags: - valencian - spanish - english - text-generation - instruct - dpo - alignment - alia - gplsi datasets: - nvidia/HelpSteer3 - OpenAssistant/oasst1 - OpenAssistant/oasst2 - Open-Orca/OpenOrca library_name: transformers pipeline_tag: text-generation --- # Aitana-2B-S-Instruct-Aligned **Aitana-2B-S-Instruct-Aligned** is a DPO-aligned instruction-tuned generative language model from the **Aitana family**, developed by the [GPLSI (Language and Information Systems Group)](https://gplsi.dlsi.ua.es/) at the University of Alicante. Built on [gplsi/Aitana-2B-S-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct), this model has been further aligned using Direct Preference Optimization (DPO) to improve response quality and alignment with human preferences across Valencian, Spanish, and English. ## Table of Contents - [Model Description](#model-description) - [Alignment Details](#alignment-details) - [Training Data](#training-data) - [Intended Uses](#intended-uses) - [How to Use](#how-to-use) - [Evaluation](#evaluation) - [Additional Information](#additional-information) ## Model Description | Property | Value | |----------|-------| | **Base Model** | [gplsi/Aitana-2B-S-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct) | | **Architecture** | Transformer decoder-only | | **Parameters** | ~2.25B | | **Languages** | Valencian, Spanish, English | | **License** | Apache 2.0 | Aitana-2B-S-Instruct-Aligned extends the Aitana-2B-S-Instruct instruction-tuned model with Direct Preference Optimization (DPO) alignment. This additional training stage improves the model's ability to generate helpful, high-quality responses that better align with human preferences while maintaining its strong multilingual capabilities. ## Alignment Details The model was aligned using Direct Preference Optimization (DPO) with the following configuration: | Hyperparameter | Value | |----------------|-------| | **Method** | DPO (Direct Preference Optimization) | | **Learning rate** | 5e-6 | | **Epochs** | 1 | | **Beta** | 0.1 | | **LR Scheduler** | Linear | | **Total Samples** | 146,180 | | **English Samples** | 80,308 | | **Spanish Samples** | 30,072 | | **Valencian Samples** | 35,800 | | **Languages** | Spanish, Valencian, English | The DPO alignment was performed using curated preference pairs that teach the model to prefer more helpful, accurate, and well-structured responses. ## Training Data The base instruction model was trained on the ALIA Instruction/v12 dataset. This DPO-aligned variant was further aligned using the Alignment/v8 dataset, composed of the following preference data: | Dataset ID | Name | Languages | Source | |------------|------|-----------|--------| | al1 | HelpSteer3 | EN, ES | [nvidia/HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3) | | al2 | OpenAssistant1 (OASST1) | EN, ES, RU (+32 more) | [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) | | al3 | OpenAssistant2 (OASST2) | EN, ES, RU (+32 more) | [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) | | al4 | OpenOrca | EN | [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) | | al5 | OASST2 Valenciano | VA | — | The alignment data focused on English, Spanish, and Valencian preference pairs, with the distribution: 80,308 English, 30,072 Spanish, and 35,800 Valencian samples. ## Intended Uses This model can be used for: - **Instruction following** in Valencian, Spanish, and English with improved alignment to human preferences - **Chat and conversational applications** requiring high-quality multilingual responses - **Text generation** with task-specific prompting and improved output quality - **Domain-specific applications** in administrative, legal, or tourism contexts > **Note**: As an aligned instruction-tuned model, it is designed to follow user prompts and generate helpful, safe responses. It is not intended for use as a factual knowledge base. The DPO alignment improves response quality and preference alignment. ## How to Use ### Transformers ```python import torch from transformers import pipeline, AutoTokenizer model_id = "gplsi/Aitana-2B-S-Instruct-Aligned" tokenizer = AutoTokenizer.from_pretrained(model_id) generator = pipeline( "text-generation", model=model_id, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto", ) # Valencian example text = "Explica què són les Corts Valencianes i quina funció tenen." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) # Spanish example text = "Describe las principales funciones del gobierno autonómico valenciano." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) # English example text = "Explain the role of tourism in the Valencian Community economy." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) ``` ## Evaluation In the following tables, we present the results obtained with different benchmarks from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) in comparison with [Salamandra-2B-Instruct](https://huggingface.co/BSC-LT/Salamandra-2B-Instruct). The results reflect the DPO-aligned instruction-tuned performance. ### Valencian #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------| | XNLI | va |Natural Language Inference | acc | **0.520** | 0.514 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------| | Cocoteros | va |Reading Comprehension | bleu | 2.796 | **3.612** | | Phrases ca-va | va-ca |Translation - Adaptation | bleu | 58.425 | **74.538** | | Phrases va-ca | va-ca |Translation - Adaptation | bleu | 70.660 | **71.691** | | Phrases va-es | va-es |Translation | bleu | 65.427 | **72.097** | | Phrases es-va | es-va |Translation | bleu | 45.688 | **56.012** | | Truthfulqa_va | va | Truthfulness | bleu_acc | **0.409** | 0.394 | ### Catalan #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------------|-------------|---------------|-----------------------| | Belebele Cat_latn | ca | Reading Comprehension | acc | **0.287** | 0.248 | | COPA | ca | Commonsense Reasoning | acc | 0.708 | **0.726** | | XStoryCloze | ca | Commonsense Reasoning | acc | 0.616 | **0.629** | | OpenBookQA | ca | Question Answering | acc | **0.296** | **0.296** | | PAWS | ca | Paraphrasing | acc | **0.602** | 0.598 | | PiQA | ca | Question Answering | acc | 0.638 | **0.655** | | ARC Easy | ca | Question Answering | acc | 0.516 | **0.524** | | ARC Challenge | ca | Question Answering | acc | 0.298 | **0.314** | | XNLI | ca | Natural Language Inference| acc | 0.513 | **0.515** | | Teca | ca | Natural Language Inference| acc | 0.486 | **0.500** | | WNLI | ca | Natural Language Inference| acc | **0.563** | 0.437 | | Catcola | ca | Linguistic Acceptability | acc | 0.492 | **0.713** | | Catcola | ca | Linguistic Acceptability | mcc | **0.097** | -0.040 | | Catalanqa | ca | Question Answering | F1 | **0.516** | 0.384 | | Mgsm direct | ca | Math | exact match | 0.000 | **0.012** | | Catalanqa | ca | Question Answering | exact match | **0.182** | 0.011 | | Xquad | ca | Question Answering | exact match | **0.103** | 0.014 | | Xquad | ca | Question Answering | F1 | **0.394** | 0.287 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|--------------------------|--------|----------------|-----------------------| | Cabreu abstractive | ca | Summarization | bleu | 7.610 | **7.703** | | Cabreu extractive | ca | Summarization | bleu | **38.002** | 19.876 | | Cabreu extreme | ca | Summarization | bleu | 2.733 | **3.245** | ### Spanish #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------------|-------------|---------------|-----------------------| | Belebele | es | Reading Comprehension | acc | **0.268** | 0.244 | | PAWS | es | Paraphrasing | acc | 0.566 | **0.618** | | XNLI | es | Natural Language Inference| acc | **0.463** | 0.439 | | WNLI | es | Natural Language Inference| acc | 0.479 | **0.535** | | XStoryCloze | es | Commonsense Reasoning | acc | 0.617 | **0.628** | | Escola | es | Linguistic Acceptability | acc | 0.293 | **0.708** | | Escola | es | Linguistic Acceptability | mcc | **0.020** | 0.000 | | OpenbookQA | es | Question Answering | acc | 0.286 | **0.338** | | MGSM Direct | es | Math | exact match | 0.020 | **0.024** | | XQUAD | es | Question Answering | exact match | **0.066** | 0.026 | | XQUAD | es | Question Answering | F1 | **0.355** | 0.293 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------|---------|----------------|-----------------------| | Cocoteros | es |Reading Comprehension| bleu | **3.308** | 3.141 | | XLSum | es | Summarization | bleu | 1.695 | **1.737** | ### English #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------| | Arc Challenge | en | Question Answering | acc | 0.354 | **0.363** | | Arc Easy | en | Question Answering | acc | 0.681 | **0.709** | | Belebele | en | Reading Comprehension | acc | 0.260 | **0.293** | | PAWS | en | Paraphrasing | acc | **0.597** | 0.594 | | XNLI | en | Natural Language Inference | acc | 0.512 | **0.553** | | XStoryCloze | en | Commonsense Reasoning | acc | 0.662 | **0.680** | | OpenBookQA | en | Question Answering | acc | 0.298 | **0.338** | | PiQA | en | Question Answering | acc | 0.715 | **0.717** | | Social iqa | en | Question Answering | acc | **0.453** | 0.451 | | WNLI | en | Natural Language Inference | acc | **0.535** | 0.465 | | MGSM Direct | en | Math | exact match | 0.008 | **0.052** | | TriviaQA | en | Question Answering | exact match | 0.076 | **0.147** | ### Judge Evaluation The model was also evaluated using an LLM-as-judge approach across different task categories. The scores below represent the average rating (1-5 scale, 5 being best) and standard deviation for each task category, comparing Aitana-2B-S-Instruct-Aligned-v0.1 against Salamandra-2B-Instruct. | Task Category | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | |---------------|------------------------|-----------------------------------| | CommonSense reasoning | 2.277 / 1.151 | **2.737 / 1.140** | | Maths | 1.060 / 0.124 | **1.123 / 0.249** | | Paraphrasing | **3.518 / 1.308** | 3.460 / 1.088 | | Reading comprehension | **2.966 / 1.111** | 2.894 / 1.311 | | Summarization | 2.217 / 1.068 | **2.261 / 0.820** | | Translation | **3.557 / 0.760** | 3.418 / 0.999 | | **Overall Avg** | 2.599 / 0.920 | **2.649 / 0.935** | The DPO-aligned model shows a notable improvement in overall average score (2.649 vs 2.599) compared to Salamandra-2B-Instruct, with particular gains in CommonSense reasoning and Maths. The aligned model also shows tighter standard deviations in several categories, indicating more consistent quality responses. ## Additional Information ### Author The model has been developed by the **Language and [Information Systems Group (GPLSI)](https://gplsi.dlsi.ua.es/)** and the **[Centro de Inteligencia Digital (CENID)](https://cenid.es)**, both part of the **[University of Alicante (UA)](https://www.ua.es/es/)**, as part of their ongoing research in **Natural Language Processing (NLP)**. ### Funding This work is funded by the **Ministerio para la Transformación Digital y de la Función Pública**, co-financed by the **EU – NextGenerationEU**, within the framework of the project *Desarrollo de Modelos ALIA*. This work has also been partially supported by Project HEART-NLP (PID2024-156263OB-C22). ### Acknowledgments We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work. Special thanks to: - [Language Technologies Laboratory at Barcelona Supercomputing Center](https://www.bsc.es/es/discover-bsc/organisation/research-structure/language-technologies-laboratory) - [Centro Vasco de Tecnología de la Lengua (HiTZ)](https://www.hitz.eus/es) - [Centro Singular de Investigación en Tecnologías Inteligentes (CiTIUS)](https://citius.gal/) - [Sistemas Inteligentes de Acceso a la Información (SINAI)](https://www.ujaen.es/investigacion-y-transferencia/grupos-de-investigacion/sistemas-inteligentes-de-acceso-la-informacion-sinai) - [Instituto Universitario de Investigación Informática (IUII)](https://web.ua.es/es/iuii/) - [Leonardo HPC System](https://leonardo-supercomputer.cineca.eu/) - [European supercomputing ecosystem (EUROHPC)](https://www.eurohpc-ju.europa.eu/) We also acknowledge the financial, technical, and scientific support of the **Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA**, whose contribution has been essential to the completion of this research. ### License [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) ### Disclaimer This model is intended for general purposes and is available under a permissive Apache License 2.0. Be aware that the model may have biases and/or undesirable outputs. Users deploying systems based on this model are responsible for mitigating risks and complying with applicable AI regulations. ### Reference ```bibtex @misc{gplsi-aitana-2B-S-Instruct-Aligned, author = {Galiano, Santiago and Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Grande, Eduardo and Estevanell-Valladares, Ernesto L. and Consuegra-Ayala, Juan Pablo and Miró Maestre, María and Canal-Esteve, Miquel and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena, Rafael and Palomar, Manuel}, title = {Aitana 2B Instruct Aligned: DPO-aligned instruction-tuned model for Valencian, Spanish and English}, year = {2026}, institution = {Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA)}, howpublished = {\url{https://huggingface.co/gplsi/Aitana-2B-S-Instruct-Aligned}}, note = {Accessed: 2026-05-11} } ``` --- **Copyright © 2026 Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA). Distributed under the Apache License 2.0.**