Instructions to use norallm/normistral-7b-warm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use norallm/normistral-7b-warm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="norallm/normistral-7b-warm")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("norallm/normistral-7b-warm")
model = AutoModelForCausalLM.from_pretrained("norallm/normistral-7b-warm")

llama-cpp-python

How to use norallm/normistral-7b-warm with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="norallm/normistral-7b-warm",
	filename="normistral-7b-warm.Q3_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Local Apps Settings

llama.cpp

How to use norallm/normistral-7b-warm with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf norallm/normistral-7b-warm:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf norallm/normistral-7b-warm:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf norallm/normistral-7b-warm:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf norallm/normistral-7b-warm:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf norallm/normistral-7b-warm:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf norallm/normistral-7b-warm:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf norallm/normistral-7b-warm:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf norallm/normistral-7b-warm:Q4_K_M

Use Docker

docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M

LM Studio
Jan

vLLM

How to use norallm/normistral-7b-warm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "norallm/normistral-7b-warm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "norallm/normistral-7b-warm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M

SGLang

How to use norallm/normistral-7b-warm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "norallm/normistral-7b-warm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "norallm/normistral-7b-warm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "norallm/normistral-7b-warm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "norallm/normistral-7b-warm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use norallm/normistral-7b-warm with Ollama:
```
ollama run hf.co/norallm/normistral-7b-warm:Q4_K_M
```

Unsloth Studio

How to use norallm/normistral-7b-warm with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for norallm/normistral-7b-warm to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for norallm/normistral-7b-warm to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for norallm/normistral-7b-warm to start chatting

Atomic Chat new
Docker Model Runner
How to use norallm/normistral-7b-warm with Docker Model Runner:
```
docker model run hf.co/norallm/normistral-7b-warm:Q4_K_M
```

Lemonade

How to use norallm/normistral-7b-warm with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull norallm/normistral-7b-warm:Q4_K_M

Run and chat with the model

lemonade run user.normistral-7b-warm-Q4_K_M

List all available models

lemonade list

Is this model capable of more than just translation?

by Dogfetus - opened Jul 3, 2024

Discussion

Dogfetus

Jul 3, 2024

Hi, I'm new to this area, and I noticed that this model is tagged as generative. Does it only handle translation, or can it also generate answers to questions?

espenhk

Jul 3, 2024

I would look at the instruct-models for question-answering, for instance normistral-7b-warm-instruct. Those are further instruction trained from this model, meaning they’ve been trained on a hand-crafted, higher quality dataset on question-answering specifically.

How that performs will be up to you to evaluate, but it does answer questions.

Dogfetus

Jul 3, 2024

•

edited Jul 3, 2024

Thank you for the information. However, I am looking for a Norwegian-trained LLM for question-answering. Is the normistral-7b-warm-instruct specifically trained on Norwegian? Looking through the stated datasets (used to train normistral-7b-warm-instruct), they seem to be only in English, which is why I'm wondering.

espenhk

Jul 3, 2024

Per the model card, final step of the fine-tuning corpus:
“ Finally, we translated the resulting dataset into Bokmål and Nynorsk using NorMistral-7b-warm.”

See the rest of the model card for the other steps, but essentially it’s a collected, cleaned and enhanced English dataset that’s been translated to Norwegian bokmål and nynorsk. Which means the model is trained to answer questions in Norwegian, although you’d expect that it might have some English-structured looking responses sometimes. And it’s conditioned on how well the regular normistral-warm does translation (hopefully quite well).

Dogfetus

Jul 3, 2024

Oh, I see now. Thank you for the clarification.

Dogfetus changed discussion status to closed Jul 3, 2024

espenhk

Jul 3, 2024

I’d like to add, though: IMHO the scratch-models - trained on nothing but Norwegian (except for the code generation dataset), they should in theory produce the best Norwegian responses. That is given enough data, though, which isn’t the case yet.

With time I’d hope we can get enough purely Norwegian data to get a model that competes with ones trained with English data - maybe when the national library sorts out copyright issues and can expand the NCC by a lot? And on top of that be able to create large enough instruction-training datasets to do the instruction-training on “proper” Norwegian as well. Go have a look at the NorwAI instruction-trained models (from NTNU): their instruction training dataset has some shortcomings (currently doesn’t comply with a chat template, for instance), but it is purely in Norwegian which is pretty cool.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment