Instructions to use Prady029/AyurParam-2.9b-it-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Prady029/AyurParam-2.9b-it-gguf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Prady029/AyurParam-2.9b-it-gguf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Prady029/AyurParam-2.9b-it-gguf", dtype="auto")

llama-cpp-python

How to use Prady029/AyurParam-2.9b-it-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Prady029/AyurParam-2.9b-it-gguf",
	filename="AyurParam-2.9b-it-fp16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Prady029/AyurParam-2.9b-it-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0

Use Docker

docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0

LM Studio
Jan

vLLM

How to use Prady029/AyurParam-2.9b-it-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Prady029/AyurParam-2.9b-it-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Prady029/AyurParam-2.9b-it-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0

SGLang

How to use Prady029/AyurParam-2.9b-it-gguf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Prady029/AyurParam-2.9b-it-gguf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Prady029/AyurParam-2.9b-it-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Prady029/AyurParam-2.9b-it-gguf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Prady029/AyurParam-2.9b-it-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Prady029/AyurParam-2.9b-it-gguf with Ollama:
```
ollama run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
```

Unsloth Studio

How to use Prady029/AyurParam-2.9b-it-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Prady029/AyurParam-2.9b-it-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Prady029/AyurParam-2.9b-it-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Prady029/AyurParam-2.9b-it-gguf to start chatting

Atomic Chat new
Docker Model Runner
How to use Prady029/AyurParam-2.9b-it-gguf with Docker Model Runner:
```
docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
```

Lemonade

How to use Prady029/AyurParam-2.9b-it-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Prady029/AyurParam-2.9b-it-gguf:Q8_0

Run and chat with the model

lemonade run user.AyurParam-2.9b-it-gguf-Q8_0

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

AyurParam-2.9b-it-gguf

GGUF quantized release of AyurParam-2.9B-Instruct — India's first bilingual, instruction-tuned large language model specialized for Ayurveda. Packaged for efficient local inference via llama.cpp and Ollama.

Overview

AyurParam-2.9B is a domain-specialized, bilingual large language model built by the BharatGen team at IIT Bombay's Technology Innovation Hub, and presented in the paper AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda (Nauman et al., 2025).

General-purpose LLMs consistently underperform on highly specialized domains requiring deep cultural, linguistic, and subject-matter expertise. Ayurveda — with its centuries of nuanced textual and clinical knowledge encoded in Sanskrit, Hindi, and regional languages — is a prime example of this gap. AyurParam directly addresses this challenge by combining the bilingual strengths of Param-1-2.9B-Instruct with a meticulously curated Ayurvedic knowledge base.

This repository ships the model in GGUF format, making it immediately runnable on consumer hardware (CPU or GPU) using llama.cpp or Ollama.

Key Highlights

Attribute	Detail
Base model	`bharatgenai/Param-1-2.9B-Instruct`
Format	GGUF
Parameters	~2.9 Billion
Quantized variants	Q4_K_M (1.82 GB), Q8_0 (3.05 GB), FP16 (5.73 GB)
Languages	English + Hindi (bilingual)
Training corpus	~4.75M supervised samples
Training hardware	Multi-node NVIDIA H100 cluster
Training duration	~2 days (single H100 node)
Training framework	Hugging Face TRL (SFT)
Benchmark	BhashaBench-Ayur (BBA)
License	Apache 2.0

The Paper at a Glance

Motivation

Mainstream LLMs fail to accurately interpret or apply Ayurvedic knowledge for several interconnected reasons:

Domain gap — Ayurvedic concepts such as dosha imbalances, samprapti (pathogenesis), dhatu (tissues), and panchakarma (purification) require precise reasoning grounded in classical frameworks absent from general pretraining.
Linguistic gap — Ayurvedic literature spans Sanskrit, Devanagari, IAST transliteration, and bilingual clinical Hindi-English discourse. Most LLMs lack competence across this spectrum.
Knowledge gap — Classical compendia such as Charaka Samhita, Sushruta Samhita, Ashtanga Hridaya, and Kashyapa Samhita are underrepresented in standard pretraining corpora.

AyurParam is the first bilingual, instruction-tuned LLM extensively benchmarked for authentic, context-rich performance in Ayurveda.

Model Architecture

AyurParam inherits the transformer architecture of Param-1-2.9B-Instruct with the following configuration:

Hyperparameter	Value
Hidden size	2048
Intermediate (FFN) size	7168
Attention heads	16
Hidden layers	32
Key-value heads	8 (GQA)
Max position embeddings	2048
Activation function	SiLU
Vocabulary	256,000 tokens
Task-specific tokens	6 (`<user>`, `<assistant>`, `<context>`, `<system_prompt>`, `<actual_response>`, `</actual_response>`)

Dataset Construction

The training corpus was assembled through a rigorous multi-stage pipeline designed to ensure authenticity, domain coverage, and bilingual fidelity.

Taxonomy-Guided Curation

Before any data was collected, the team established a curriculum-aligned taxonomy ensuring representation across all major branches of Ayurveda. This prevented over-representation of easily available material (e.g., Panchakarma manuals) and ensured coverage of underrepresented domains including specializations and canonical compendia.

Source Material

Data was sourced from open-access repositories:

Archive.org, eGangotri, and NDLI (National Digital Library of India)
Digitized classical manuscripts in Devanagari, IAST, and English transliteration
Clinical guidelines and objective assessments
Reasoning-driven query-answer pairs

Data Processing Pipeline

The pipeline comprised four core stages:

Corpus collection — Systematic harvesting from digital archives using Devanagari, IAST, and English retrieval lenses.
OCR processing — Extraction of machine-readable text from scanned manuscripts with domain-specific quality filters.
Quality assurance — Expert annotation protocols enforcing factual precision and instructional clarity.
Knowledge-grounded Q&A generation — Structured generation of dialogue-style prompt-completion pairs, covering:
- Context-aware Q&A — Multi-turn consultation scenarios
- Reasoning-intensive prompts — Dosha diagnosis, samprapti analysis, treatment selection
- Objective-style Q&A — Factual recall from classical texts

Final Corpus Scale

The supervised fine-tuning corpus comprised approximately 4.75 million samples in both English and Hindi, using custom bilingual instruction templates to support single-turn and multi-turn Ayurvedic instruction-following.

Training Details

AyurParam was fine-tuned using Supervised Fine-Tuning (SFT) via the Hugging Face TRL framework:

Training framework : Hugging Face TRL (SFT)
Distributed training: torchrun (multi-node)
Hardware           : NVIDIA H100 GPU cluster
Training duration  : ~2 days (single H100 node)
Corpus size        : ~4.75M instruction samples
Template style     : Custom bilingual (English + Hindi)

Custom bilingual instruction templates were developed to better support both single-turn and multi-turn Ayurvedic instruction-following across English and Hindi.

Evaluation: BhashaBench-Ayur (BBA)

AyurParam was benchmarked on BhashaBench-Ayur (BBA), introduced as part of the broader BhashaBench V1 — India's first domain-specific, multi-task, bilingual benchmark for Indic knowledge systems.

BhashaBench V1 contains 74,166 meticulously curated question-answer pairs (52,494 English + 21,672 Hindi), spanning four domains: Agriculture, Legal, Finance, and Ayurveda — covering 90+ subdomains and 500+ topics.

Performance by Question Type

Question Type	AyurParam-2.9B	Notes
MCQ	40.12%	Highest accuracy among all compared models, including much larger ones
Assertion/Reasoning	Competitive	Strong contextual discrimination
Multi-turn Q&A	Competitive	Robust instruction-following across turns

AyurParam surpasses all open-source instruction-tuned models in the 1.5–3B parameter class and demonstrates competitive or superior performance compared to significantly larger models. For reference, GPT-4o achieves only 59.74% overall accuracy in the Ayurveda domain of BhashaBench — illustrating the difficulty of the task even for frontier models.

Why MCQ Performance Matters

Strong MCQ accuracy reflects the model's ability to discriminate between closely related Ayurvedic concepts and therapeutic approaches — a critical skill for educational assessment, practitioner certification preparation, and clinical decision support tooling.

Comparison with Prior Work

Model	Parameters	Ayurveda Domain Focus	Bilingual (EN+HI)	Benchmarked on BBA
AyurGPT	Various	Partial	Partial	No
IRGPT	Various	Partial	Partial	No
GPT-4o	~1T+	General	Yes	Yes (59.74% overall)
AyurParam-2.9B	2.9B	Full	Yes	Yes (SOTA in class)

AyurParam is the first model to combine: (1) Ayurveda-specific pretraining at scale, (2) rigorous bilingual instruction tuning, and (3) systematic evaluation on a dedicated Ayurvedic benchmark.

Intended Use Cases

Ayurvedic education — Explanation of classical concepts, text interpretation, self-study Q&A
Research assistance — Literature review, classical text analysis, cross-referencing compendia
Clinical knowledge support — Reference tool for practitioners (NOT a clinical decision system)
Content generation — Bilingual wellness content, educational materials, FAQ generation
Benchmarking — Baseline for future Ayurvedic AI research

Quickstart

Available Quantized Variants

Variant	File	Size	Use Case
Q4_K_M	`ayurparam-q4_k_m.gguf`	1.82 GB	Recommended for CPU inference; best quality/size trade-off
Q8_0	`AyurParam-2.9b-it-q8_0.gguf`	3.05 GB	Higher fidelity; good for GPU or high-RAM CPU setups
FP16	`AyurParam-2.9b-it-fp16.gguf`	5.73 GB	Full FP16 precision; maximum fidelity for GPU inference

Option 1 — llama.cpp (CPU / GPU)

Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
# Verify build
./main --help

Download the model

git lfs install
git clone https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf
# Available files:
#   ayurparam-q4_k_m.gguf          (1.82 GB — CPU recommended)
#   AyurParam-2.9b-it-q8_0.gguf   (3.05 GB — higher quality)
#   AyurParam-2.9b-it-fp16.gguf   (5.73 GB — GPU / max fidelity)

Run a single prompt

./main \
  -m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  -p "Explain the Ayurvedic concept of the three doshas — Vata, Pitta, and Kapha — and their role in maintaining health." \
  -n 256 \
  --temp 0.7

Interactive chat mode

./main \
  -m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  --interactive \
  -ins \
  --temp 0.7

Start a local OpenAI-compatible server

./llama-server \
  --model path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  --port 8080 \
  --threads 8 \
  --ctx-size 2048

Query the server

curl -s -X POST "http://localhost:8080/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What are the dietary recommendations for a Pitta-dominant constitution?"}
    ],
    "max_tokens": 300,
    "temperature": 0.7
  }'

Option 2 — Ollama (Recommended for beginners)

Ollama provides a simple model management interface and an OpenAI-compatible local API.

Install Ollama — follow ollama.ai for your OS.

Pull and run the model

ollama run Prady029/AyurParam-2.9b-it-gguf

Query via API

curl -s -X POST "http://localhost:11434/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "AyurParam-2.9b-it-gguf",
    "prompt": "List three lifestyle practices from Dinacharya (Ayurvedic daily routine) that support Kapha balance.",
    "max_tokens": 200,
    "temperature": 0.7
  }'

Option 3 — Python with llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./AyurParam-2.9b-it-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8,
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are AyurParam, a knowledgeable Ayurvedic assistant. Provide accurate, culturally grounded responses based on classical Ayurvedic texts."
        },
        {
            "role": "user",
            "content": "Explain samprapti (pathogenesis) in the context of a Vata imbalance."
        }
    ],
    max_tokens=512,
    temperature=0.7,
)

print(response["choices"][0]["message"]["content"])

Prompt Format

AyurParam uses a custom bilingual instruction template. For best results, structure prompts as follows:

English

<system_prompt>You are AyurParam, an expert Ayurvedic assistant with deep knowledge of classical texts and clinical Ayurveda.</system_prompt>
<user>What is the Ayurvedic understanding of Agni (digestive fire) and its types?</user>
<assistant>

Hindi

<system_prompt>आप AyurParam हैं, एक विशेषज्ञ आयुर्वेदिक सहायक जो शास्त्रीय ग्रंथों और नैदानिक आयुर्वेद का गहन ज्ञान रखते हैं।</system_prompt>
<user>आयुर्वेद में त्रिदोष सिद्धांत क्या है?</user>
<assistant>

Limitations

⚠️ Medical Disclaimer: AyurParam is an informational and educational tool only. Outputs must not be used for clinical diagnosis, treatment decisions, or emergency medical guidance. Always consult a qualified Ayurvedic practitioner or licensed medical professional.

Limitation	Description
Not a medical device	Outputs are informational; not validated for clinical use
No safety guardrails	Lacks explicit mechanisms to prevent generation of harmful medical advice
Hallucinations	Can produce plausible but factually incorrect claims; verify with authoritative sources
No personalization	Does not account for individual patient histories or contraindications
Domain bias	Trained primarily on Ayurvedic and related corpora; may over-generalize
Language coverage	Optimized for English and Hindi; other languages not guaranteed
Data licensing	Training corpus limited to open-access repositories; licensed clinical databases not included
Quantization effects	Q4_K_M and Q8_0 reduce memory/disk usage but may slightly degrade generation quality vs FP16

Practical Tips

Out of memory: Use ayurparam-q4_k_m.gguf (1.82 GB) or reduce context size (--ctx-size 1024)
Balanced quality/memory: Use AyurParam-2.9b-it-q8_0.gguf (3.05 GB) on machines with 6–8 GB RAM
Maximum fidelity: Use AyurParam-2.9b-it-fp16.gguf (5.73 GB) on a GPU with 8+ GB VRAM
Slow CPU inference: Increase thread count (--threads 8 or set OMP_NUM_THREADS=8)
Quality comparison: Compare outputs across Q4_K_M, Q8_0, and FP16 variants on representative Ayurvedic prompts to pick the right trade-off
Expert review: For any deployment, have domain experts review representative outputs before public release
Prompt clarity: More specific prompts (e.g., specifying dosha, text source, or clinical context) yield better results

Repository Structure

AyurParam-2.9b-it-gguf/
├── ayurparam-q4_k_m.gguf             # Q4_K_M quantized model (1.82 GB) — CPU recommended
├── AyurParam-2.9b-it-q8_0.gguf       # Q8_0 quantized model (3.05 GB) — higher fidelity
├── AyurParam-2.9b-it-fp16.gguf       # FP16 model (5.73 GB) — maximum precision, GPU
├── .gitattributes                     # LFS tracking config
└── README.md                          # This file

Citation

If you use AyurParam in your research or application, please cite the original paper and this model:

@misc{nauman2025ayurparamstateoftheartbilinguallanguage,
  title        = {AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda},
  author       = {Mohd Nauman and Sravan Gvm and Vijay Devane and Shyam Pawar and
                  Viraj Thakur and Kundeshwar Pundalik and Piyush Sawarkar and
                  Rohit Saluja and Maunendra Desarkar and Ganesh Ramakrishnan},
  year         = {2025},
  eprint       = {2511.02374},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2511.02374}
}

@misc{ayurparam_gguf_2025,
  title  = {AyurParam-2.9b-it-gguf},
  author = {Pradyumna Kumar Sahoo},
  year   = {2025},
  url    = {https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf},
  note   = {Contact: prady029@duck.com}
}

Credits

This GGUF release was prepared and published by:

Name	Email
Pradyumna Kumar Sahoo	prady029@duck.com

For questions about the original AyurParam research, reach the BharatGen team:

Contact	Email
Sravan Kumar	sravan.kumar@tihiitb.org
Kundeshwar Pundalik	kundeshwar.pundalik@tihiitb.org
Mohd Nauman	mohd.nauman@tihiitb.org

Acknowledgements

AyurParam was developed at the Technology Innovation Hub (TIH), IIT Bombay as part of the BharatGen initiative — advancing AI for Indic languages and knowledge systems. The base model Param-1-2.9B-Instruct was developed by bharatgenai. Benchmark data was curated under the BhashaBench V1 framework (bhashavbenchv1).

AyurParam — Bridging five thousand years of Ayurvedic wisdom with modern AI.