Instructions to use prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B

SGLang

How to use prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Regulus-Qwen3-R1-Llama-Distill-1.7B

Regulus-Qwen3-R1-Llama-Distill-1.7B is a distilled reasoning model fine-tuned on Qwen/Qwen3-1.7B using Magpie-Align/Magpie-Reasoning-V2-250K-CoT-DeepSeek-R1-Llama-70B. The training leverages distilled traces from DeepSeek-R1-Llama-70B, transferring advanced reasoning patterns into a lightweight 1.7B parameter model. It is specialized for chain-of-thought reasoning across code, math, and science, optimized for efficiency and mid-resource deployment.

GGUF: https://huggingface.co/prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B-GGUF

Key Features

Distilled Reasoning from Large-Scale Models Trained with distilled traces from DeepSeek-R1-Llama-70B, preserving structured chain-of-thought reasoning in a smaller, faster model.
Unified Code + Math + Science Reasoning Strong performance across computational logic, programming tasks, and scientific problem solving.
Structured Chain-of-Thought Generation Produces clear, step-by-step explanations for algorithms, equations, and symbolic tasks.
Optimized Lightweight Footprint Maintains reasoning depth while being deployable on mid-range GPUs, offline clusters, and edge AI systems.
Multi-Format Output Support Generates responses in LaTeX, Markdown, JSON, and tabular formats for technical and research workflows.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Regulus-Qwen3-R1-Llama-Distill-1.7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain step by step how to solve a system of linear equations using Gaussian elimination."

messages = [
    {"role": "system", "content": "You are a reasoning assistant skilled in math, code, and scientific logic."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Math and algorithm tutoring with clear reasoning steps
Code reasoning and synthesis for debugging and algorithm design
Scientific problem solving in physics, chemistry, and biology
Structured educational assistant for step-by-step learning
Efficient deployment where distilled reasoning fidelity is required

Limitations

Derived from distilled traces – reasoning may simplify compared to full-scale teacher models
Not tuned for general-purpose conversation or creative writing
Context length limits multi-document or long-codebase reasoning
Optimized for structured reasoning, not emotional or casual dialogue