---
language:
- en
license: apache-2.0
base_model: swiss-ai/Apertus-8B-Instruct-2509
tags:
- academic-project
- pruning
- vocabulary-pruning
- nlp
- llm
- ml optimization
---

# Model Card for Apertus-8B_pruned-english-ds

## Model Summary

This model is a vocabulary-pruned English-only version of [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509). 
It was created as part of an academic project in Machine Learning to 
investigate the effects of vocabulary reduction on model size and performance.


**Base Model:** [Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509)  
**Developer (Base Model):** Swiss AI Initiative (ETH Zurich, EPFL, CSCS)  
**Pruning Method:** Vocabulary Pruning (see details below)

## Vocabulary Pruning Details

The pruned vocabulary was obtained by collecting all the tokens found in a dataset
purely in english. We used the following dataset:
[B. Consortium, “British national corpus 1994,” 2007,
literary and Linguistic Data Service.](http://hdl.handle.net/20.500.14106/2554)


## Intended Use

This model is intended for **academic research and educational purposes**, specifically to study:
- The impact of language restriction on multilingual LLM performance.
- Efficiency gains in memory usage and inference speed.
- Comparative analysis between full-scale and pruned models.

For general-purpose instruction following or production use, we recommend using the original [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509).

## How to Use

You can load this model using the `transformers` library. Ensure you are using a recent version of `transformers`.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "epfl-ml-ytf/apertus-8b-pruned-english-ds-63159"

# Load the pruned tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Example generation
messages = [
    {"role": "user", "content": "Explain the concept of vocabulary pruning in one sentence."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))