--- language: - en license: apache-2.0 base_model: swiss-ai/Apertus-8B-Instruct-2509 tags: - academic-project - pruning - vocabulary-pruning - nlp - llm - ml optimization --- # Model Card for Apertus-8B_pruned-english-ds ## Model Summary This model is a vocabulary-pruned English-only version of [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509). It was created as part of an academic project in Machine Learning to investigate the effects of vocabulary reduction on model size and performance. **Base Model:** [Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509) **Developer (Base Model):** Swiss AI Initiative (ETH Zurich, EPFL, CSCS) **Pruning Method:** Vocabulary Pruning (see details below) ## Vocabulary Pruning Details The pruned vocabulary was obtained by collecting all the tokens found in a dataset purely in english. We used the following dataset: [B. Consortium, “British national corpus 1994,” 2007, literary and Linguistic Data Service.](http://hdl.handle.net/20.500.14106/2554) ## Intended Use This model is intended for **academic research and educational purposes**, specifically to study: - The impact of language restriction on multilingual LLM performance. - Efficiency gains in memory usage and inference speed. - Comparative analysis between full-scale and pruned models. For general-purpose instruction following or production use, we recommend using the original [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509). ## How to Use You can load this model using the `transformers` library. Ensure you are using a recent version of `transformers`. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "epfl-ml-ytf/apertus-8b-pruned-english-ds-63159" # Load the pruned tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) # Example generation messages = [ {"role": "user", "content": "Explain the concept of vocabulary pruning in one sentence."} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) outputs = model.generate(inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True))