Instructions to use LoneStriker/Brezn-7b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use LoneStriker/Brezn-7b-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LoneStriker/Brezn-7b-GGUF", filename="Brezn-7b-Q3_K_L.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use LoneStriker/Brezn-7b-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LoneStriker/Brezn-7b-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LoneStriker/Brezn-7b-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use LoneStriker/Brezn-7b-GGUF with Ollama:
ollama run hf.co/LoneStriker/Brezn-7b-GGUF:Q4_K_M
- Unsloth Studio
How to use LoneStriker/Brezn-7b-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LoneStriker/Brezn-7b-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LoneStriker/Brezn-7b-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LoneStriker/Brezn-7b-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use LoneStriker/Brezn-7b-GGUF with Docker Model Runner:
docker model run hf.co/LoneStriker/Brezn-7b-GGUF:Q4_K_M
- Lemonade
How to use LoneStriker/Brezn-7b-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LoneStriker/Brezn-7b-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Brezn-7b-GGUF-Q4_K_M
List all available models
lemonade list
🥨 Brezn-7B
This is right now our best performing german speaking 7B model with an apache license, with an average of 7.49 on mt-bench-de. You can test this model here: mayflowergmbh/Brezn-7B-GGUF-Chat.
Brezn-7B is a dpo aligned merge of the following models using LazyMergekit:
- FelixChao/WestSeverus-7B-DPO-v2
- mayflowergmbh/Wiedervereinigung-7b-dpo-laser
- cognitivecomputations/openchat-3.5-0106-laser
💻 Usage
In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
E.g.
text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"
This format is available as a chat template via the apply_chat_template() method:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("mayflowergmbh/Brezn-7b")
tokenizer = AutoTokenizer.from_pretrained("mayflowergmbh/Brezn-7b")
messages = [
{"role": "user", "content": "Was ist dein Lieblingsgewürz??"},
{"role": "assistant", "content": "Nun, ich mag besonders gerne einen guten Spritzer frischen Zitronensaft. Er fügt genau die richtige Menge an würzigem Geschmack hinzu, egal was ich gerade in der Küche zubereite!"},
{"role": "user", "content": "Hast du Mayonnaise-Rezepte?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
mt-bench-de
{
"first_turn": 7.6625,
"second_turn": 7.31875,
"categories": {
"writing": 8.75,
"roleplay": 8.5,
"reasoning": 6.1,
"math": 5.05,
"coding": 5.4,
"extraction": 7.975,
"stem": 9,
"humanities": 9.15
},
"average": 7.490625
}
🧩 Configuration
models:
- model: mistralai/Mistral-7B-v0.1
# no parameters necessary for base model
- model: FelixChao/WestSeverus-7B-DPO-v2
parameters:
density: 0.60
weight: 0.30
- model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
density: 0.65
weight: 0.40
- model: cognitivecomputations/openchat-3.5-0106-laser
parameters:
density: 0.6
weight: 0.3
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
- Downloads last month
- 3
3-bit
4-bit
5-bit
6-bit
8-bit

docker model run hf.co/LoneStriker/Brezn-7b-GGUF: