Instructions to use SINAI/ALIA-es-legal-administrative-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SINAI/ALIA-es-legal-administrative-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SINAI/ALIA-es-legal-administrative-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("SINAI/ALIA-es-legal-administrative-7B-Instruct") model = AutoModelForMultimodalLM.from_pretrained("SINAI/ALIA-es-legal-administrative-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use SINAI/ALIA-es-legal-administrative-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SINAI/ALIA-es-legal-administrative-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SINAI/ALIA-es-legal-administrative-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SINAI/ALIA-es-legal-administrative-7B-Instruct
- SGLang
How to use SINAI/ALIA-es-legal-administrative-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SINAI/ALIA-es-legal-administrative-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SINAI/ALIA-es-legal-administrative-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SINAI/ALIA-es-legal-administrative-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SINAI/ALIA-es-legal-administrative-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use SINAI/ALIA-es-legal-administrative-7B-Instruct with Docker Model Runner:
docker model run hf.co/SINAI/ALIA-es-legal-administrative-7B-Instruct
Tokenizer mismatch
Hi,
The chat template is missing three \n.
Was the finetune done with the wrong template?
Also, why not trim whitespace?
Thanks for the feedback! The model was fine-tuned using exactly this chat template, so it is internally consistent, the template reflects the actual format used during training. Using a different template at inference time (e.g. with extra \n) may lead to slightly inconsistent behavior at turn boundaries, since the fine-tuning was done with this specific format. Regarding whitespace, "add_prefix_space: true" is inherited from the LLaMA SentencePiece tokenizer and is a tokenizer-level setting that does not affect the chat template output directly. Have you actually observed leading spaces in the decoded outputs?
Thanks for answering my question,
Good to know that chat_template.jinja is correct and tokenizer_config.json is what is wrong.
"chat_template": "{{- bos_token }}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
By 'trim whitespace' I don't mean the prefix space, sorry for the unclear explanation, I'm refering to the chat template.
Regarding leading spaces in the output, yes, that's what prompted me to check the tokenizer.
Can replicate if you do a few generations with this prompt:
¿Cuál es la constitución española?
I checked the datasets and found some examples, but I'm not sure if there are enough to bias the model like that.
Hi again! Thanks for the clarification about whitespace trimming.
We have now updated the chat_template in tokenizer_config.json to match the chat_template.jinja used during fine-tuning. This should fix the leading spaces in the output.
Regarding whitespace trimming in the template, the chat_template.jinja already uses {%- and -%} tags where appropriate to avoid unwanted whitespace. Now that both templates are in sync, this should be consistent.
Thanks again for taking the time to investigate and report this so thoroughly!
