Instructions to use QuantFactory/Baldur-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Baldur-8B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Baldur-8B-GGUF",
	filename="Baldur-8B.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use QuantFactory/Baldur-8B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Baldur-8B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Baldur-8B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use QuantFactory/Baldur-8B-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Baldur-8B-GGUF:Q4_K_M
```

Unsloth Studio

How to use QuantFactory/Baldur-8B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Baldur-8B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Baldur-8B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Baldur-8B-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use QuantFactory/Baldur-8B-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Baldur-8B-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Baldur-8B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Baldur-8B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Baldur-8B-GGUF-Q4_K_M

List all available models

lemonade list

Baldur-8B-GGUF / README.md

aashish1904

Upload README.md with huggingface_hub

73a65a2 verified over 1 year ago

preview code

Raw

History Blame Contribute Delete

10.2 kB


	---

	language:
	- en
	license: agpl-3.0
	tags:
	- chat
	base_model:
	- arcee-ai/Llama-3.1-SuperNova-Lite
	datasets:
	- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
	- Nitral-AI/Cybersecurity-ShareGPT
	- Nitral-AI/Medical_Instruct-ShareGPT
	- Nitral-AI/Olympiad_Math-ShareGPT
	- anthracite-org/kalo_opus_misc_240827
	- NewEden/Claude-Instruct-5k
	- lodrick-the-lafted/kalo-opus-instruct-3k-filtered
	- anthracite-org/kalo-opus-instruct-22k-no-refusal
	- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
	- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
	- anthracite-org/kalo_misc_part2
	- Nitral-AI/Creative_Writing-ShareGPT
	- NewEden/Gryphe-Sonnet3.5-Charcard-Roleplay-unfiltered
	License: agpl-3.0
	Language:
	- En
	Pipeline_tag: text-generation
	Base_model: arcee-ai/Llama-3.1-SuperNova-Lite
	Tags:
	- Chat
	model-index:
	- name: Baldur-8B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 47.82
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 32.54
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 12.61
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 6.94
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 14.01
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 29.49
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Baldur-8B
	name: Open LLM Leaderboard

	---

	[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


	# QuantFactory/Baldur-8B-GGUF
	This is quantized version of [Delta-Vector/Baldur-8B](https://huggingface.co/Delta-Vector/Baldur-8B) created using llama.cpp

	# Original Model Card


	![](https://huggingface.co/Delta-Vector/Baldur-8B/resolve/main/Baldur.jpg)


	An finetune of the L3.1 instruct distill done by Arcee, The intent of this model is to have differing prose then my other releases, in my testing it has achieved this and avoiding using common -isms frequently and has a differing flavor then my other models.


	# Quants

	GGUF: https://huggingface.co/Delta-Vector/Baldur-8B-GGUF

	EXL2: https://huggingface.co/Delta-Vector/Baldur-8B-EXL2


	## Prompting
	Model has been Instruct tuned with the Llama-Instruct formatting. A typical input would look like this:

	```py
	"""<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>
	You are an AI built to rid the world of bonds and journeys!<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>
	Bro i just wanna know what is 2+2?<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	"""
	```
	## System Prompting

	I would highly recommend using Sao10k's Euryale System prompt, But the "Roleplay Simple" system prompt provided within SillyTavern will work aswell.

	```
	Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.

	<Guidelines>
	• Maintain the character persona but allow it to evolve with the story.
	• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.
	• All types of outputs are encouraged; respond accordingly to the narrative.
	• Include dialogues, actions, and thoughts in each response.
	• Utilize all five senses to describe scenarios within {{char}}'s dialogue.
	• Use emotional symbols such as "!" and "~" in appropriate contexts.
	• Incorporate onomatopoeia when suitable.
	• Allow time for {{user}} to respond with their own input, respecting their agency.
	• Act as secondary characters and NPCs as needed, and remove them when appropriate.
	• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.
	</Guidelines>

	<Forbidden>
	• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.
	• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.
	• Repetitive and monotonous outputs.
	• Positivity bias in your replies.
	• Being overly extreme or NSFW when the narrative context is inappropriate.
	</Forbidden>

	Follow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.

	```


	## Axolotl config

	<details><summary>See axolotl config</summary>

	Axolotl version: `0.4.1`
	```yaml
	base_model: arcee-ai/Llama-3.1-SuperNova-Lite
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	#trust_remote_code: true

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
	type: chat_template
	- path: Nitral-AI/Cybersecurity-ShareGPT
	type: chat_template
	- path: Nitral-AI/Medical_Instruct-ShareGPT
	type: chat_template
	- path: Nitral-AI/Olympiad_Math-ShareGPT
	type: chat_template
	- path: anthracite-org/kalo_opus_misc_240827
	type: chat_template
	- path: NewEden/Claude-Instruct-5k
	type: chat_template
	- path: lodrick-the-lafted/kalo-opus-instruct-3k-filtered
	type: chat_template
	- path: anthracite-org/kalo-opus-instruct-22k-no-refusal
	type: chat_template
	- path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
	type: chat_template
	- path: Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
	type: chat_template
	- path: anthracite-org/kalo_misc_part2
	type: chat_template
	- path: Nitral-AI/Creative_Writing-ShareGPT
	type: chat_template
	- path: NewEden/Gryphe-Sonnet3.5-Charcard-Roleplay-unfiltered
	type: chat_template

	chat_template: llama3
	shuffle_merged_datasets: true
	default_system_message: "You are an assistant that responds to the user."
	dataset_prepared_path: prepared_dataset_memorycore
	val_set_size: 0.0
	output_dir: ./henbane-8b-r3

	sequence_len: 8192
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len:

	adapter:
	lora_model_dir:
	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_linear:
	lora_fan_in_fan_out:

	wandb_project: henbane-8b-r3
	wandb_entity:
	wandb_watch:
	wandb_name: henbane-8b-r3
	wandb_log_model:

	gradient_accumulation_steps: 32
	micro_batch_size: 1
	num_epochs: 2
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	#learning_rate: 3e-5
	learning_rate: 1e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 5
	evals_per_epoch:
	eval_table_size:
	eval_max_new_tokens:
	saves_per_epoch: 2
	debug:
	deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
	weight_decay: 0.05
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|finetune_right_pad_id\|>
	eos_token: <\|eot_id\|>


	```
	</details><br>


	## Credits

	Thank you to [Lucy Knada](https://huggingface.co/lucyknada), [Kalomaze](https://huggingface.co/kalomaze), [Kubernetes Bad](https://huggingface.co/kubernetes-bad) and the rest of [Anthracite](https://huggingface.co/anthracite-org) (But not Alpin.)

	## Training
	The training was done for 2 epochs. I used 2 x [RTX 6000s](https://www.nvidia.com/en-us/design-visualization/rtx-6000/) GPUs graciously provided by [Kubernetes Bad](https://huggingface.co/kubernetes-bad) for the full-parameter fine-tuning of the model.

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Delta-Vector__Baldur-8B)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|23.90\|
	\|IFEval (0-Shot) \|47.82\|
	\|BBH (3-Shot) \|32.54\|
	\|MATH Lvl 5 (4-Shot)\|12.61\|
	\|GPQA (0-shot) \| 6.94\|
	\|MuSR (0-shot) \|14.01\|
	\|MMLU-PRO (5-shot) \|29.49\|