Instructions to use prithivMLmods/Canum-Qwen3_R1-4B-iCoT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Canum-Qwen3_R1-4B-iCoT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Canum-Qwen3_R1-4B-iCoT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Canum-Qwen3_R1-4B-iCoT")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Canum-Qwen3_R1-4B-iCoT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Canum-Qwen3_R1-4B-iCoT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Canum-Qwen3_R1-4B-iCoT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Canum-Qwen3_R1-4B-iCoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Canum-Qwen3_R1-4B-iCoT

SGLang

How to use prithivMLmods/Canum-Qwen3_R1-4B-iCoT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Canum-Qwen3_R1-4B-iCoT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Canum-Qwen3_R1-4B-iCoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Canum-Qwen3_R1-4B-iCoT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Canum-Qwen3_R1-4B-iCoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Canum-Qwen3_R1-4B-iCoT with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Canum-Qwen3_R1-4B-iCoT
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Canum-Qwen3_R1-4B-iCoT

Canum-Qwen3_R1-4B-iCoT is a precision-tuned variant of the Qwen3-4B architecture, explicitly aligned with internal chain-of-thought (iCoT) methodologies. Trained on the TAUR-dev/STEPS__r1_4d_eval__mini_all dataset, this model excels in long-form mathematical reasoning, progressive symbolic logic, and multi-stage problem decomposition, all within a compact 4B parameter footprint.

GGUF : https://huggingface.co/prithivMLmods/Canum-Qwen3_R1-4B-iCoT-Q4_K_M-GGUF

Key Features

Internal Chain-of-Thought Reasoning (iCoT) Enables deeper logical progression through internally coherent reasoning steps, ideal for complex mathematical derivations and multivariable algebraic thinking.
Dataset: TAUR-dev/STEPS__r1_4d_eval__mini_all Fine-tuned using structured evaluation sequences to build resilience in multi-step problem solving and improve interpretability in math-focused tasks.
Long Reasoning Paths in STEM Domains Suited for long-chain logical flows in geometry, number theory, calculus, and symbolic manipulation, including proofs and multi-stage equation solving.
Lightweight Yet Capable (4B) Maintains strong reasoning and instruction-following abilities with lower computational cost compared to larger models, suitable for single-GPU deployments.
Instruction-Following and Step-by-Step Alignment Follows complex instructions with multi-turn dependencies and provides granular output that aligns with internal steps used in the reasoning process.
Technical Format Adaptability Outputs answers in clean Markdown, LaTeX, JSON, or table formats for academic, development, and notebook-based use cases.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Canum-Qwen3_R1-4B-iCoT"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Use internal CoT to solve: A rectangle has a length that is 3 times its width. If the perimeter is 48 units, what are the dimensions?"

messages = [
    {"role": "system", "content": "You are a reasoning assistant trained to use internal chain-of-thought (iCoT) for multi-step mathematical problems."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Internal chain-of-thought (iCoT) problem solving
Long-form symbolic math and algebraic derivations
Curriculum-based step-by-step math tutoring
Structured multi-turn reasoning in STEM domains
Output generation in technical formats (LaTeX, Markdown)

Limitations

May require well-structured prompts for optimal reasoning output
Smaller context length may limit extremely long multi-part problems
Focused on precision reasoning, not creative or subjective writing
Best used with prompt patterns that guide internal logical steps

References

TAUR-dev/STEPS__r1_4d_eval__mini_all – Dataset for structured math reasoning
Internal CoT (iCoT) – Progressive logical strategy for complex problems
AIMO-2 Math Benchmark – OpenMathReasoning
YaRN: Efficient Context Extension of LLMs