Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

43

Base only

Active filters: W4A16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2

Text Generation • 33B • Updated Dec 18, 2024 • 10 • 16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v3

Text Generation • 33B • Updated Dec 20, 2024 • 11 • 14

ModelCloud/Falcon3-10B-Instruct-gptqmodel-4bit-vortex-v1

Text Generation • 10B • Updated Dec 21, 2024 • 8 • 3

ModelCloud/Qwen2.5-0.5B-Instruct-gptqmodel-w4a16

Text Generation • 0.5B • Updated Oct 19, 2025 • 20 • 1

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v1

Text Generation • 8B • Updated Jan 24, 2025 • 7 • 6

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v2

Text Generation • 8B • Updated Jan 24, 2025 • 176 • 8

RedHatAI/phi-4-quantized.w4a16

Text Generation • 15B • Updated 10 days ago • 93.3k • 5

RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16

Image-Text-to-Text • 24B • Updated 10 days ago • 2.56k • 10

RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

Image-Text-to-Text • 109B • Updated 10 days ago • 7.62k • 13

pyrymikko/nomic-embed-code-W4A16-AWQ

7B • Updated Sep 30, 2025 • 6.26k

tcclaviger/Minimax-M2-Thrift-GPTQ-W4A16-AMD

Text Generation • 24B • Updated Dec 1, 2025 • 9 • 1

TevunahAi/granite-34b-code-instruct-8k-Ultra-Hybrid

Text Generation • 11B • Updated Dec 1, 2025 • 5

TevunahAi/Llama-3.1-70B-Instruct-Ultra-Hybrid

Text Generation • 22B • Updated Dec 4, 2025 • 2

Vishva007/Qwen3-4B-Instruct-2507-W4A16-AutoRound

Text Generation • 0.9B • Updated Jan 30 • 2

Vishva007/Qwen3-VL-8B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 2B • Updated Feb 7 • 398

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 0.9B • Updated Feb 7 • 3

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound-GPTQ

Image-Text-to-Text • 2B • Updated Feb 7 • 2

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound-AWQ

Image-Text-to-Text • 2B • Updated Feb 7 • 33

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 1B • Updated Feb 7 • 2

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound-GPTQ

Image-Text-to-Text • 4B • Updated Feb 7 • 18

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound-AWQ

Image-Text-to-Text • 4B • Updated Feb 7 • 365 • 1

embedl/Cosmos-Reason2-2B-W4A16

Image-Text-to-Text • 2B • Updated May 19 • 650 • 11

bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4A16

Text Generation • 15B • Updated Apr 5 • 5.28k • 5

bg-digitalservices/Apertus-8B-2509-NVFP4A16

Text Generation • 5B • Updated Apr 6 • 11

bg-digitalservices/Apertus-8B-Instruct-2509-NVFP4A16

Text Generation • 5B • Updated Apr 6 • 12

bg-digitalservices/Apertus-70B-2509-NVFP4A16

Text Generation • 36B • Updated Apr 6 • 31

bg-digitalservices/Apertus-70B-Instruct-2509-NVFP4A16

Text Generation • 36B • Updated Apr 6 • 21 • 1

bg-digitalservices/Gemma-4-E2B-NVFP4A16

Text Generation • 4B • Updated Apr 6 • 27

bg-digitalservices/Gemma-4-E2B-it-NVFP4A16

Text Generation • 4B • Updated Apr 6 • 20.5k

bg-digitalservices/Gemma-4-E4B-it-NVFP4A16

Text Generation • 6B • Updated Apr 6 • 482