Instructions to use Sao10K/Fimbulvetr-11B-v2.1-16K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sao10K/Fimbulvetr-11B-v2.1-16K with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sao10K/Fimbulvetr-11B-v2.1-16K")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Sao10K/Fimbulvetr-11B-v2.1-16K")
model = AutoModelForMultimodalLM.from_pretrained("Sao10K/Fimbulvetr-11B-v2.1-16K")

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Sao10K/Fimbulvetr-11B-v2.1-16K with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sao10K/Fimbulvetr-11B-v2.1-16K"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sao10K/Fimbulvetr-11B-v2.1-16K",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Sao10K/Fimbulvetr-11B-v2.1-16K

SGLang

How to use Sao10K/Fimbulvetr-11B-v2.1-16K with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sao10K/Fimbulvetr-11B-v2.1-16K" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sao10K/Fimbulvetr-11B-v2.1-16K",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sao10K/Fimbulvetr-11B-v2.1-16K" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sao10K/Fimbulvetr-11B-v2.1-16K",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Sao10K/Fimbulvetr-11B-v2.1-16K with Docker Model Runner:
```
docker model run hf.co/Sao10K/Fimbulvetr-11B-v2.1-16K
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Trained with compute from Backyard.ai | Thanks to them and @dynafire for helping me out.

Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.

Note:
- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play well with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.

Notes:
- I noticed people having bad issues with quants. Be it GGUF or others, at 8 bit or less. Kind of a weird issue? I had little to no issues during testing unquanted.
- Slightly different results from base Fimbulvetr-v2, but during my tests they are similar enough. The vibes are still there.
- Formatting issues happen rarely. Sometimes. A reroll / regenerate fixes it from tests.
- I get consistent and reliable answers at ~11K context fine.
- Still coherent at up to 16K though! Just works not that well.

I recommend sticking up to 12K context, but loading the model at 16K for inference. It has a really accurate context up to 10K from multiple different extended long context tests. 16K works fine for roleplays, but not for more detailed tasks.

Red Needle in Haystack testing results for this specific one are usually due to weird result artifacts, like the model answering part of the key, or commenting extra. Basically, they got the result, but it's incomplete or there's additional stuff taken. Something like ' 3211' or '3211 and' instead of '321142'. Weird. Hence why its coherent and semi-reliable for roleplays at 16K context.

Downloads last month: 43

Safetensors

Model size

11B params

Tensor type

BF16

Model tree for Sao10K/Fimbulvetr-11B-v2.1-16K

Merges

3 models

Quantizations

12 models

Sao10K
/

Fimbulvetr-11B-v2.1-16K

Model tree for Sao10K/Fimbulvetr-11B-v2.1-16K

Spaces using Sao10K/Fimbulvetr-11B-v2.1-16K 9