Instructions to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF", filename="Qwen2.5-14B-Instruct-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Use Docker
docker model run hf.co/ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
- Ollama
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Ollama:
ollama run hf.co/ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
- Unsloth Studio
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF to start chatting
- Pi
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Docker Model Runner:
docker model run hf.co/ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
- Lemonade
How to use ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ThomasBaruzier/Qwen2.5-14B-Instruct-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen2.5-14B-Instruct-GGUF-Q4_K_M
List all available models
lemonade list
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,6 +25,37 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
|
|
| 25 |
|
| 26 |
# Perplexity table (the lower the better)
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
<hr>
|
| 29 |
|
| 30 |
# Qwen2.5-14B-Instruct
|
|
|
|
| 25 |
|
| 26 |
# Perplexity table (the lower the better)
|
| 27 |
|
| 28 |
+
| Quant | Size (MB) | PPL | Size (%) | Accuracy (%) | PPL error rate |
|
| 29 |
+
| ------- | --------- | ------- | -------- | ------------ | -------------- |
|
| 30 |
+
| IQ1_S | 3441 | 22.0082 | 12.21 | 27.14 | 0.16818 |
|
| 31 |
+
| IQ1_M | 3693 | 15.079 | 13.11 | 39.62 | 0.1106 |
|
| 32 |
+
| IQ2_XXS | 4114 | 9.6047 | 14.6 | 62.2 | 0.06625 |
|
| 33 |
+
| IQ2_XS | 4487 | 8.3649 | 15.92 | 71.41 | 0.05574 |
|
| 34 |
+
| IQ2_S | 4772 | 8.1942 | 16.93 | 72.9 | 0.0548 |
|
| 35 |
+
| IQ2_M | 5109 | 7.7261 | 18.13 | 77.32 | 0.05177 |
|
| 36 |
+
| Q2_K_S | 5148 | 8.0641 | 18.27 | 74.08 | 0.0549 |
|
| 37 |
+
| Q2_K | 5504 | 7.6005 | 19.53 | 78.6 | 0.05146 |
|
| 38 |
+
| IQ3_XXS | 5672 | 6.9285 | 20.13 | 86.22 | 0.04547 |
|
| 39 |
+
| IQ3_XS | 6088 | 6.721 | 21.6 | 88.88 | 0.04329 |
|
| 40 |
+
| Q3_K_S | 6352 | 6.8697 | 22.54 | 86.96 | 0.04576 |
|
| 41 |
+
| IQ3_S | 6383 | 6.6246 | 22.65 | 90.17 | 0.04285 |
|
| 42 |
+
| IQ3_M | 6597 | 6.6359 | 23.41 | 90.02 | 0.04256 |
|
| 43 |
+
| Q3_K_M | 7000 | 6.5281 | 24.84 | 91.51 | 0.043 |
|
| 44 |
+
| Q3_K_L | 7558 | 6.4323 | 26.82 | 92.87 | 0.04211 |
|
| 45 |
+
| IQ4_XS | 7744 | 6.2005 | 27.48 | 96.34 | 0.04022 |
|
| 46 |
+
| Q4_0 | 8149 | 6.2928 | 28.92 | 94.93 | 0.04095 |
|
| 47 |
+
| IQ4_NL | 8154 | 6.208 | 28.94 | 96.23 | 0.04032 |
|
| 48 |
+
| Q4_K_S | 8177 | 6.163 | 29.02 | 96.93 | 0.03976 |
|
| 49 |
+
| Q4_K_M | 8572 | 6.1311 | 30.42 | 97.43 | 0.03957 |
|
| 50 |
+
| Q4_1 | 8958 | 6.1674 | 31.79 | 96.86 | 0.03981 |
|
| 51 |
+
| Q5_K_S | 9791 | 6.0411 | 34.75 | 98.88 | 0.03886 |
|
| 52 |
+
| Q5_0 | 9817 | 6.0504 | 34.84 | 98.73 | 0.03895 |
|
| 53 |
+
| Q5_K_M | 10023 | 6.0389 | 35.57 | 98.92 | 0.03888 |
|
| 54 |
+
| Q5_1 | 10625 | 6.0366 | 37.71 | 98.96 | 0.03885 |
|
| 55 |
+
| Q6_K | 11564 | 6.0004 | 41.04 | 99.56 | 0.0386 |
|
| 56 |
+
| Q8_0 | 14975 | 5.9821 | 53.14 | 99.86 | 0.03842 |
|
| 57 |
+
| F16 | 28179 | 5.9737 | 100 | 100 | 0.03835 |
|
| 58 |
+
|
| 59 |
<hr>
|
| 60 |
|
| 61 |
# Qwen2.5-14B-Instruct
|