How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/GRAM-LLaMA3.2-3B-RewardModel-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/GRAM-LLaMA3.2-3B-RewardModel-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for prithivMLmods/GRAM-LLaMA3.2-3B-RewardModel-GGUF to start chatting
Quick Links

GRAM-LLaMA3.2-3B-RewardModel-GGUF

GRAM-LLaMA3.2-3B-RewardModel is a generative reward model fine-tuned from the Llama-3.2-3B-Instruct base model released by NiuTrans. It is designed to improve reward generalization for large language models (LLMs) by leveraging a novel training approach that first pre-trains on large unlabeled datasets and then fine-tunes using supervised labeled data. The training uses label smoothing and optimizes a regularized ranking loss, bridging generative and discriminative reward modeling techniques. This enables the model to be applied flexibly across a variety of tasks without the usual need for extensive fine-tuning on task-specific datasets.

GRAM-LLaMA3.2-3B-RewardModel is evaluated on the JudgeBench benchmark, which covers domains such as Chat, Code, Math, and Safety. It achieves a competitive average score of 69.9 across these categories, demonstrating strong capability for use as an open-source plug-and-play reward model that can align LLMs effectively without retraining reward models from scratch. The repository includes usage examples that let users directly apply this reward model for assessing and ranking the quality of AI-generated responses in an impartial manner.

Model Files

Model File name Size QuantType
GRAM-LLaMA3.2-3B-RewardModel.BF16.gguf 6.43 GB BF16
GRAM-LLaMA3.2-3B-RewardModel.F16.gguf 6.43 GB F16
GRAM-LLaMA3.2-3B-RewardModel.F32.gguf 12.9 GB F32
GRAM-LLaMA3.2-3B-RewardModel.Q2_K.gguf 1.36 GB Q2_K
GRAM-LLaMA3.2-3B-RewardModel.Q3_K_L.gguf 1.82 GB Q3_K_L
GRAM-LLaMA3.2-3B-RewardModel.Q3_K_M.gguf 1.69 GB Q3_K_M
GRAM-LLaMA3.2-3B-RewardModel.Q3_K_S.gguf 1.54 GB Q3_K_S
GRAM-LLaMA3.2-3B-RewardModel.Q4_K_M.gguf 2.02 GB Q4_K_M
GRAM-LLaMA3.2-3B-RewardModel.Q4_K_S.gguf 1.93 GB Q4_K_S
GRAM-LLaMA3.2-3B-RewardModel.Q5_K_M.gguf 2.32 GB Q5_K_M
GRAM-LLaMA3.2-3B-RewardModel.Q5_K_S.gguf 2.27 GB Q5_K_S
GRAM-LLaMA3.2-3B-RewardModel.Q6_K.gguf 2.64 GB Q6_K
GRAM-LLaMA3.2-3B-RewardModel.Q8_0.gguf 3.42 GB Q8_0

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
159
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/GRAM-LLaMA3.2-3B-RewardModel-GGUF

Quantized
(3)
this model