How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf GTO83/modelos:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf GTO83/modelos:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf GTO83/modelos:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf GTO83/modelos:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf GTO83/modelos:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf GTO83/modelos:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf GTO83/modelos:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf GTO83/modelos:Q4_K_M
Use Docker
docker model run hf.co/GTO83/modelos:Q4_K_M
Quick Links

Qwen2.5-7B-Instruct-GGUF

Chat

Perplexity table (the lower the better)

Quant Size (MB) PPL Size (%) Accuracy (%) PPL error rate
IQ1_S 417 193.6245 14.13 5.24 1.77149
IQ1_M 443 66.9068 15.01 15.17 0.52878
IQ2_XXS 488 33.3356 16.54 30.45 0.25559
IQ2_XS 525 20.287 17.79 50.04 0.14936
IQ2_S 538 18.2927 18.23 55.49 0.1338
IQ2_M 574 15.4838 19.45 65.56 0.11113
Q2_K_S 611 16.0169 20.7 63.38 0.11623
IQ3_XXS 638 12.3935 21.62 81.91 0.0877
Q2_K 645 14.1657 21.86 71.66 0.10105
IQ3_XS 698 11.7112 23.65 86.68 0.08256
Q3_K_S 726 12.4782 24.6 81.35 0.08842
IQ3_S 728 11.4241 24.67 88.86 0.07977
IQ3_M 741 11.4058 25.11 89 0.07862
Q3_K_M 786 11.3529 26.64 89.42 0.08018
Q3_K_L 840 11.1934 28.46 90.69 0.07913
IQ4_XS 855 10.5302 28.97 96.4 0.07351
IQ4_NL 893 10.5116 30.26 96.57 0.07335
Q4_0 895 10.8217 30.33 93.8 0.07576
Q4_K_S 897 10.5236 30.4 96.46 0.0736
Q4_K_M 941 10.4628 31.89 97.02 0.0731
Q4_1 970 10.51 32.87 96.59 0.07347
Q5_K_S 1048 10.2715 35.51 98.83 0.07148
Q5_0 1051 10.3196 35.62 98.37 0.07212
Q5_K_M 1073 10.2529 36.36 99.01 0.07143
Q5_1 1126 10.2624 38.16 98.92 0.0714
Q6_K 1214 10.203 41.14 99.49 0.07108
Q8_0 1571 10.167 53.24 99.84 0.07068
F16 2951 10.1512 100 100 0.07058

Downloads last month
2
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GTO83/modelos

Base model

Qwen/Qwen2.5-7B
Quantized
(83)
this model