Instructions to use Prady029/AyurParam-2.9b-it-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Prady029/AyurParam-2.9b-it-gguf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Prady029/AyurParam-2.9b-it-gguf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Prady029/AyurParam-2.9b-it-gguf", dtype="auto") - llama-cpp-python
How to use Prady029/AyurParam-2.9b-it-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Prady029/AyurParam-2.9b-it-gguf", filename="AyurParam-2.9b-it-fp16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Prady029/AyurParam-2.9b-it-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Prady029/AyurParam-2.9b-it-gguf:Q8_0
Use Docker
docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
- LM Studio
- Jan
- vLLM
How to use Prady029/AyurParam-2.9b-it-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Prady029/AyurParam-2.9b-it-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Prady029/AyurParam-2.9b-it-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
- SGLang
How to use Prady029/AyurParam-2.9b-it-gguf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Prady029/AyurParam-2.9b-it-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Prady029/AyurParam-2.9b-it-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Prady029/AyurParam-2.9b-it-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Prady029/AyurParam-2.9b-it-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Prady029/AyurParam-2.9b-it-gguf with Ollama:
ollama run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
- Unsloth Studio
How to use Prady029/AyurParam-2.9b-it-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Prady029/AyurParam-2.9b-it-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Prady029/AyurParam-2.9b-it-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Prady029/AyurParam-2.9b-it-gguf to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Prady029/AyurParam-2.9b-it-gguf with Docker Model Runner:
docker model run hf.co/Prady029/AyurParam-2.9b-it-gguf:Q8_0
- Lemonade
How to use Prady029/AyurParam-2.9b-it-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Prady029/AyurParam-2.9b-it-gguf:Q8_0
Run and chat with the model
lemonade run user.AyurParam-2.9b-it-gguf-Q8_0
List all available models
lemonade list
AyurParam-2.9b-it-gguf
GGUF quantized release of AyurParam-2.9B-Instruct — India's first bilingual, instruction-tuned large language model specialized for Ayurveda. Packaged for efficient local inference via llama.cpp and Ollama.
Overview
AyurParam-2.9B is a domain-specialized, bilingual large language model built by the BharatGen team at IIT Bombay's Technology Innovation Hub, and presented in the paper AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda (Nauman et al., 2025).
General-purpose LLMs consistently underperform on highly specialized domains requiring deep cultural, linguistic, and subject-matter expertise. Ayurveda — with its centuries of nuanced textual and clinical knowledge encoded in Sanskrit, Hindi, and regional languages — is a prime example of this gap. AyurParam directly addresses this challenge by combining the bilingual strengths of Param-1-2.9B-Instruct with a meticulously curated Ayurvedic knowledge base.
This repository ships the model in GGUF format, making it immediately runnable on consumer hardware (CPU or GPU) using llama.cpp or Ollama.
Key Highlights
| Attribute | Detail |
|---|---|
| Base model | bharatgenai/Param-1-2.9B-Instruct |
| Format | GGUF |
| Parameters | ~2.9 Billion |
| Quantized variants | Q4_K_M (1.82 GB), Q8_0 (3.05 GB), FP16 (5.73 GB) |
| Languages | English + Hindi (bilingual) |
| Training corpus | ~4.75M supervised samples |
| Training hardware | Multi-node NVIDIA H100 cluster |
| Training duration | ~2 days (single H100 node) |
| Training framework | Hugging Face TRL (SFT) |
| Benchmark | BhashaBench-Ayur (BBA) |
| License | Apache 2.0 |
The Paper at a Glance
Motivation
Mainstream LLMs fail to accurately interpret or apply Ayurvedic knowledge for several interconnected reasons:
- Domain gap — Ayurvedic concepts such as dosha imbalances, samprapti (pathogenesis), dhatu (tissues), and panchakarma (purification) require precise reasoning grounded in classical frameworks absent from general pretraining.
- Linguistic gap — Ayurvedic literature spans Sanskrit, Devanagari, IAST transliteration, and bilingual clinical Hindi-English discourse. Most LLMs lack competence across this spectrum.
- Knowledge gap — Classical compendia such as Charaka Samhita, Sushruta Samhita, Ashtanga Hridaya, and Kashyapa Samhita are underrepresented in standard pretraining corpora.
AyurParam is the first bilingual, instruction-tuned LLM extensively benchmarked for authentic, context-rich performance in Ayurveda.
Model Architecture
AyurParam inherits the transformer architecture of Param-1-2.9B-Instruct with the following configuration:
| Hyperparameter | Value |
|---|---|
| Hidden size | 2048 |
| Intermediate (FFN) size | 7168 |
| Attention heads | 16 |
| Hidden layers | 32 |
| Key-value heads | 8 (GQA) |
| Max position embeddings | 2048 |
| Activation function | SiLU |
| Vocabulary | 256,000 tokens |
| Task-specific tokens | 6 (<user>, <assistant>, <context>, <system_prompt>, <actual_response>, </actual_response>) |
Dataset Construction
The training corpus was assembled through a rigorous multi-stage pipeline designed to ensure authenticity, domain coverage, and bilingual fidelity.
Taxonomy-Guided Curation
Before any data was collected, the team established a curriculum-aligned taxonomy ensuring representation across all major branches of Ayurveda. This prevented over-representation of easily available material (e.g., Panchakarma manuals) and ensured coverage of underrepresented domains including specializations and canonical compendia.
Source Material
Data was sourced from open-access repositories:
- Archive.org, eGangotri, and NDLI (National Digital Library of India)
- Digitized classical manuscripts in Devanagari, IAST, and English transliteration
- Clinical guidelines and objective assessments
- Reasoning-driven query-answer pairs
Data Processing Pipeline
The pipeline comprised four core stages:
- Corpus collection — Systematic harvesting from digital archives using Devanagari, IAST, and English retrieval lenses.
- OCR processing — Extraction of machine-readable text from scanned manuscripts with domain-specific quality filters.
- Quality assurance — Expert annotation protocols enforcing factual precision and instructional clarity.
- Knowledge-grounded Q&A generation — Structured generation of dialogue-style prompt-completion pairs, covering:
- Context-aware Q&A — Multi-turn consultation scenarios
- Reasoning-intensive prompts — Dosha diagnosis, samprapti analysis, treatment selection
- Objective-style Q&A — Factual recall from classical texts
Final Corpus Scale
The supervised fine-tuning corpus comprised approximately 4.75 million samples in both English and Hindi, using custom bilingual instruction templates to support single-turn and multi-turn Ayurvedic instruction-following.
Training Details
AyurParam was fine-tuned using Supervised Fine-Tuning (SFT) via the Hugging Face TRL framework:
Training framework : Hugging Face TRL (SFT)
Distributed training: torchrun (multi-node)
Hardware : NVIDIA H100 GPU cluster
Training duration : ~2 days (single H100 node)
Corpus size : ~4.75M instruction samples
Template style : Custom bilingual (English + Hindi)
Custom bilingual instruction templates were developed to better support both single-turn and multi-turn Ayurvedic instruction-following across English and Hindi.
Evaluation: BhashaBench-Ayur (BBA)
AyurParam was benchmarked on BhashaBench-Ayur (BBA), introduced as part of the broader BhashaBench V1 — India's first domain-specific, multi-task, bilingual benchmark for Indic knowledge systems.
BhashaBench V1 contains 74,166 meticulously curated question-answer pairs (52,494 English + 21,672 Hindi), spanning four domains: Agriculture, Legal, Finance, and Ayurveda — covering 90+ subdomains and 500+ topics.
Performance by Question Type
| Question Type | AyurParam-2.9B | Notes |
|---|---|---|
| MCQ | 40.12% | Highest accuracy among all compared models, including much larger ones |
| Assertion/Reasoning | Competitive | Strong contextual discrimination |
| Multi-turn Q&A | Competitive | Robust instruction-following across turns |
AyurParam surpasses all open-source instruction-tuned models in the 1.5–3B parameter class and demonstrates competitive or superior performance compared to significantly larger models. For reference, GPT-4o achieves only 59.74% overall accuracy in the Ayurveda domain of BhashaBench — illustrating the difficulty of the task even for frontier models.
Why MCQ Performance Matters
Strong MCQ accuracy reflects the model's ability to discriminate between closely related Ayurvedic concepts and therapeutic approaches — a critical skill for educational assessment, practitioner certification preparation, and clinical decision support tooling.
Comparison with Prior Work
| Model | Parameters | Ayurveda Domain Focus | Bilingual (EN+HI) | Benchmarked on BBA |
|---|---|---|---|---|
| AyurGPT | Various | Partial | Partial | No |
| IRGPT | Various | Partial | Partial | No |
| GPT-4o | ~1T+ | General | Yes | Yes (59.74% overall) |
| AyurParam-2.9B | 2.9B | Full | Yes | Yes (SOTA in class) |
AyurParam is the first model to combine: (1) Ayurveda-specific pretraining at scale, (2) rigorous bilingual instruction tuning, and (3) systematic evaluation on a dedicated Ayurvedic benchmark.
Intended Use Cases
- Ayurvedic education — Explanation of classical concepts, text interpretation, self-study Q&A
- Research assistance — Literature review, classical text analysis, cross-referencing compendia
- Clinical knowledge support — Reference tool for practitioners (NOT a clinical decision system)
- Content generation — Bilingual wellness content, educational materials, FAQ generation
- Benchmarking — Baseline for future Ayurvedic AI research
Quickstart
Available Quantized Variants
| Variant | File | Size | Use Case |
|---|---|---|---|
| Q4_K_M | ayurparam-q4_k_m.gguf |
1.82 GB | Recommended for CPU inference; best quality/size trade-off |
| Q8_0 | AyurParam-2.9b-it-q8_0.gguf |
3.05 GB | Higher fidelity; good for GPU or high-RAM CPU setups |
| FP16 | AyurParam-2.9b-it-fp16.gguf |
5.73 GB | Full FP16 precision; maximum fidelity for GPU inference |
Option 1 — llama.cpp (CPU / GPU)
Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
# Verify build
./main --help
Download the model
git lfs install
git clone https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf
# Available files:
# ayurparam-q4_k_m.gguf (1.82 GB — CPU recommended)
# AyurParam-2.9b-it-q8_0.gguf (3.05 GB — higher quality)
# AyurParam-2.9b-it-fp16.gguf (5.73 GB — GPU / max fidelity)
Run a single prompt
./main \
-m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
-p "Explain the Ayurvedic concept of the three doshas — Vata, Pitta, and Kapha — and their role in maintaining health." \
-n 256 \
--temp 0.7
Interactive chat mode
./main \
-m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
--interactive \
-ins \
--temp 0.7
Start a local OpenAI-compatible server
./llama-server \
--model path/to/AyurParam-2.9b-it-q4_k_m.gguf \
--port 8080 \
--threads 8 \
--ctx-size 2048
Query the server
curl -s -X POST "http://localhost:8080/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What are the dietary recommendations for a Pitta-dominant constitution?"}
],
"max_tokens": 300,
"temperature": 0.7
}'
Option 2 — Ollama (Recommended for beginners)
Ollama provides a simple model management interface and an OpenAI-compatible local API.
Install Ollama — follow ollama.ai for your OS.
Pull and run the model
ollama run Prady029/AyurParam-2.9b-it-gguf
Query via API
curl -s -X POST "http://localhost:11434/v1/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "AyurParam-2.9b-it-gguf",
"prompt": "List three lifestyle practices from Dinacharya (Ayurvedic daily routine) that support Kapha balance.",
"max_tokens": 200,
"temperature": 0.7
}'
Option 3 — Python with llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="./AyurParam-2.9b-it-q4_k_m.gguf",
n_ctx=2048,
n_threads=8,
)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are AyurParam, a knowledgeable Ayurvedic assistant. Provide accurate, culturally grounded responses based on classical Ayurvedic texts."
},
{
"role": "user",
"content": "Explain samprapti (pathogenesis) in the context of a Vata imbalance."
}
],
max_tokens=512,
temperature=0.7,
)
print(response["choices"][0]["message"]["content"])
Prompt Format
AyurParam uses a custom bilingual instruction template. For best results, structure prompts as follows:
English
<system_prompt>You are AyurParam, an expert Ayurvedic assistant with deep knowledge of classical texts and clinical Ayurveda.</system_prompt>
<user>What is the Ayurvedic understanding of Agni (digestive fire) and its types?</user>
<assistant>
Hindi
<system_prompt>आप AyurParam हैं, एक विशेषज्ञ आयुर्वेदिक सहायक जो शास्त्रीय ग्रंथों और नैदानिक आयुर्वेद का गहन ज्ञान रखते हैं।</system_prompt>
<user>आयुर्वेद में त्रिदोष सिद्धांत क्या है?</user>
<assistant>
Limitations
⚠️ Medical Disclaimer: AyurParam is an informational and educational tool only. Outputs must not be used for clinical diagnosis, treatment decisions, or emergency medical guidance. Always consult a qualified Ayurvedic practitioner or licensed medical professional.
| Limitation | Description |
|---|---|
| Not a medical device | Outputs are informational; not validated for clinical use |
| No safety guardrails | Lacks explicit mechanisms to prevent generation of harmful medical advice |
| Hallucinations | Can produce plausible but factually incorrect claims; verify with authoritative sources |
| No personalization | Does not account for individual patient histories or contraindications |
| Domain bias | Trained primarily on Ayurvedic and related corpora; may over-generalize |
| Language coverage | Optimized for English and Hindi; other languages not guaranteed |
| Data licensing | Training corpus limited to open-access repositories; licensed clinical databases not included |
| Quantization effects | Q4_K_M and Q8_0 reduce memory/disk usage but may slightly degrade generation quality vs FP16 |
Practical Tips
- Out of memory: Use
ayurparam-q4_k_m.gguf(1.82 GB) or reduce context size (--ctx-size 1024) - Balanced quality/memory: Use
AyurParam-2.9b-it-q8_0.gguf(3.05 GB) on machines with 6–8 GB RAM - Maximum fidelity: Use
AyurParam-2.9b-it-fp16.gguf(5.73 GB) on a GPU with 8+ GB VRAM - Slow CPU inference: Increase thread count (
--threads 8or setOMP_NUM_THREADS=8) - Quality comparison: Compare outputs across Q4_K_M, Q8_0, and FP16 variants on representative Ayurvedic prompts to pick the right trade-off
- Expert review: For any deployment, have domain experts review representative outputs before public release
- Prompt clarity: More specific prompts (e.g., specifying dosha, text source, or clinical context) yield better results
Repository Structure
AyurParam-2.9b-it-gguf/
├── ayurparam-q4_k_m.gguf # Q4_K_M quantized model (1.82 GB) — CPU recommended
├── AyurParam-2.9b-it-q8_0.gguf # Q8_0 quantized model (3.05 GB) — higher fidelity
├── AyurParam-2.9b-it-fp16.gguf # FP16 model (5.73 GB) — maximum precision, GPU
├── .gitattributes # LFS tracking config
└── README.md # This file
Citation
If you use AyurParam in your research or application, please cite the original paper and this model:
@misc{nauman2025ayurparamstateoftheartbilinguallanguage,
title = {AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda},
author = {Mohd Nauman and Sravan Gvm and Vijay Devane and Shyam Pawar and
Viraj Thakur and Kundeshwar Pundalik and Piyush Sawarkar and
Rohit Saluja and Maunendra Desarkar and Ganesh Ramakrishnan},
year = {2025},
eprint = {2511.02374},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2511.02374}
}
@misc{ayurparam_gguf_2025,
title = {AyurParam-2.9b-it-gguf},
author = {Pradyumna Kumar Sahoo},
year = {2025},
url = {https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf},
note = {Contact: prady029@duck.com}
}
Credits
This GGUF release was prepared and published by:
| Name | |
|---|---|
| Pradyumna Kumar Sahoo | prady029@duck.com |
For questions about the original AyurParam research, reach the BharatGen team:
| Contact | |
|---|---|
| Sravan Kumar | sravan.kumar@tihiitb.org |
| Kundeshwar Pundalik | kundeshwar.pundalik@tihiitb.org |
| Mohd Nauman | mohd.nauman@tihiitb.org |
Acknowledgements
AyurParam was developed at the Technology Innovation Hub (TIH), IIT Bombay as part of the BharatGen initiative — advancing AI for Indic languages and knowledge systems. The base model Param-1-2.9B-Instruct was developed by bharatgenai. Benchmark data was curated under the BhashaBench V1 framework (bhashavbenchv1).
AyurParam — Bridging five thousand years of Ayurvedic wisdom with modern AI.
- Downloads last month
- 39
Model tree for Prady029/AyurParam-2.9b-it-gguf
Base model
bharatgenai/Param-1