Instructions to use norallm/normistral-7b-warm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use norallm/normistral-7b-warm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="norallm/normistral-7b-warm")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("norallm/normistral-7b-warm") model = AutoModelForCausalLM.from_pretrained("norallm/normistral-7b-warm") - llama-cpp-python
How to use norallm/normistral-7b-warm with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="norallm/normistral-7b-warm", filename="normistral-7b-warm.Q3_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Local Apps Settings
- llama.cpp
How to use norallm/normistral-7b-warm with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf norallm/normistral-7b-warm:Q4_K_M # Run inference directly in the terminal: llama cli -hf norallm/normistral-7b-warm:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf norallm/normistral-7b-warm:Q4_K_M # Run inference directly in the terminal: llama cli -hf norallm/normistral-7b-warm:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf norallm/normistral-7b-warm:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf norallm/normistral-7b-warm:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf norallm/normistral-7b-warm:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf norallm/normistral-7b-warm:Q4_K_M
Use Docker
docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use norallm/normistral-7b-warm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "norallm/normistral-7b-warm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-7b-warm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M
- SGLang
How to use norallm/normistral-7b-warm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "norallm/normistral-7b-warm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-7b-warm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "norallm/normistral-7b-warm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-7b-warm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use norallm/normistral-7b-warm with Ollama:
ollama run hf.co/norallm/normistral-7b-warm:Q4_K_M
- Unsloth Studio
How to use norallm/normistral-7b-warm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for norallm/normistral-7b-warm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for norallm/normistral-7b-warm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for norallm/normistral-7b-warm to start chatting
- Atomic Chat new
- Docker Model Runner
How to use norallm/normistral-7b-warm with Docker Model Runner:
docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M
- Lemonade
How to use norallm/normistral-7b-warm with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull norallm/normistral-7b-warm:Q4_K_M
Run and chat with the model
lemonade run user.normistral-7b-warm-Q4_K_M
List all available models
lemonade list
Is this model capable of more than just translation?
Hi, I'm new to this area, and I noticed that this model is tagged as generative. Does it only handle translation, or can it also generate answers to questions?
I would look at the instruct-models for question-answering, for instance normistral-7b-warm-instruct. Those are further instruction trained from this model, meaning they’ve been trained on a hand-crafted, higher quality dataset on question-answering specifically.
How that performs will be up to you to evaluate, but it does answer questions.
Thank you for the information. However, I am looking for a Norwegian-trained LLM for question-answering. Is the normistral-7b-warm-instruct specifically trained on Norwegian? Looking through the stated datasets (used to train normistral-7b-warm-instruct), they seem to be only in English, which is why I'm wondering.
Per the model card, final step of the fine-tuning corpus:
“ Finally, we translated the resulting dataset into Bokmål and Nynorsk using NorMistral-7b-warm.”
See the rest of the model card for the other steps, but essentially it’s a collected, cleaned and enhanced English dataset that’s been translated to Norwegian bokmål and nynorsk. Which means the model is trained to answer questions in Norwegian, although you’d expect that it might have some English-structured looking responses sometimes. And it’s conditioned on how well the regular normistral-warm does translation (hopefully quite well).
Oh, I see now. Thank you for the clarification.
I’d like to add, though: IMHO the scratch-models - trained on nothing but Norwegian (except for the code generation dataset), they should in theory produce the best Norwegian responses. That is given enough data, though, which isn’t the case yet.
With time I’d hope we can get enough purely Norwegian data to get a model that competes with ones trained with English data - maybe when the national library sorts out copyright issues and can expand the NCC by a lot? And on top of that be able to create large enough instruction-training datasets to do the instruction-training on “proper” Norwegian as well. Go have a look at the NorwAI instruction-trained models (from NTNU): their instruction training dataset has some shortcomings (currently doesn’t comply with a chat template, for instance), but it is purely in Norwegian which is pretty cool.