You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

AyurParam-2.9b-it-gguf

GGUF quantized release of AyurParam-2.9B-Instruct — India's first bilingual, instruction-tuned large language model specialized for Ayurveda. Packaged for efficient local inference via llama.cpp and Ollama.

arXiv License Model Size Languages


Overview

AyurParam-2.9B is a domain-specialized, bilingual large language model built by the BharatGen team at IIT Bombay's Technology Innovation Hub, and presented in the paper AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda (Nauman et al., 2025).

General-purpose LLMs consistently underperform on highly specialized domains requiring deep cultural, linguistic, and subject-matter expertise. Ayurveda — with its centuries of nuanced textual and clinical knowledge encoded in Sanskrit, Hindi, and regional languages — is a prime example of this gap. AyurParam directly addresses this challenge by combining the bilingual strengths of Param-1-2.9B-Instruct with a meticulously curated Ayurvedic knowledge base.

This repository ships the model in GGUF format, making it immediately runnable on consumer hardware (CPU or GPU) using llama.cpp or Ollama.


Key Highlights

Attribute Detail
Base model bharatgenai/Param-1-2.9B-Instruct
Format GGUF
Parameters ~2.9 Billion
Quantized variants Q4_K_M (1.82 GB), Q8_0 (3.05 GB), FP16 (5.73 GB)
Languages English + Hindi (bilingual)
Training corpus ~4.75M supervised samples
Training hardware Multi-node NVIDIA H100 cluster
Training duration ~2 days (single H100 node)
Training framework Hugging Face TRL (SFT)
Benchmark BhashaBench-Ayur (BBA)
License Apache 2.0

The Paper at a Glance

Motivation

Mainstream LLMs fail to accurately interpret or apply Ayurvedic knowledge for several interconnected reasons:

  • Domain gap — Ayurvedic concepts such as dosha imbalances, samprapti (pathogenesis), dhatu (tissues), and panchakarma (purification) require precise reasoning grounded in classical frameworks absent from general pretraining.
  • Linguistic gap — Ayurvedic literature spans Sanskrit, Devanagari, IAST transliteration, and bilingual clinical Hindi-English discourse. Most LLMs lack competence across this spectrum.
  • Knowledge gap — Classical compendia such as Charaka Samhita, Sushruta Samhita, Ashtanga Hridaya, and Kashyapa Samhita are underrepresented in standard pretraining corpora.

AyurParam is the first bilingual, instruction-tuned LLM extensively benchmarked for authentic, context-rich performance in Ayurveda.


Model Architecture

AyurParam inherits the transformer architecture of Param-1-2.9B-Instruct with the following configuration:

Hyperparameter Value
Hidden size 2048
Intermediate (FFN) size 7168
Attention heads 16
Hidden layers 32
Key-value heads 8 (GQA)
Max position embeddings 2048
Activation function SiLU
Vocabulary 256,000 tokens
Task-specific tokens 6 (<user>, <assistant>, <context>, <system_prompt>, <actual_response>, </actual_response>)

Dataset Construction

The training corpus was assembled through a rigorous multi-stage pipeline designed to ensure authenticity, domain coverage, and bilingual fidelity.

Taxonomy-Guided Curation

Before any data was collected, the team established a curriculum-aligned taxonomy ensuring representation across all major branches of Ayurveda. This prevented over-representation of easily available material (e.g., Panchakarma manuals) and ensured coverage of underrepresented domains including specializations and canonical compendia.

Source Material

Data was sourced from open-access repositories:

  • Archive.org, eGangotri, and NDLI (National Digital Library of India)
  • Digitized classical manuscripts in Devanagari, IAST, and English transliteration
  • Clinical guidelines and objective assessments
  • Reasoning-driven query-answer pairs

Data Processing Pipeline

The pipeline comprised four core stages:

  1. Corpus collection — Systematic harvesting from digital archives using Devanagari, IAST, and English retrieval lenses.
  2. OCR processing — Extraction of machine-readable text from scanned manuscripts with domain-specific quality filters.
  3. Quality assurance — Expert annotation protocols enforcing factual precision and instructional clarity.
  4. Knowledge-grounded Q&A generation — Structured generation of dialogue-style prompt-completion pairs, covering:
    • Context-aware Q&A — Multi-turn consultation scenarios
    • Reasoning-intensive prompts — Dosha diagnosis, samprapti analysis, treatment selection
    • Objective-style Q&A — Factual recall from classical texts

Final Corpus Scale

The supervised fine-tuning corpus comprised approximately 4.75 million samples in both English and Hindi, using custom bilingual instruction templates to support single-turn and multi-turn Ayurvedic instruction-following.


Training Details

AyurParam was fine-tuned using Supervised Fine-Tuning (SFT) via the Hugging Face TRL framework:

Training framework : Hugging Face TRL (SFT)
Distributed training: torchrun (multi-node)
Hardware           : NVIDIA H100 GPU cluster
Training duration  : ~2 days (single H100 node)
Corpus size        : ~4.75M instruction samples
Template style     : Custom bilingual (English + Hindi)

Custom bilingual instruction templates were developed to better support both single-turn and multi-turn Ayurvedic instruction-following across English and Hindi.


Evaluation: BhashaBench-Ayur (BBA)

AyurParam was benchmarked on BhashaBench-Ayur (BBA), introduced as part of the broader BhashaBench V1 — India's first domain-specific, multi-task, bilingual benchmark for Indic knowledge systems.

BhashaBench V1 contains 74,166 meticulously curated question-answer pairs (52,494 English + 21,672 Hindi), spanning four domains: Agriculture, Legal, Finance, and Ayurveda — covering 90+ subdomains and 500+ topics.

Performance by Question Type

Question Type AyurParam-2.9B Notes
MCQ 40.12% Highest accuracy among all compared models, including much larger ones
Assertion/Reasoning Competitive Strong contextual discrimination
Multi-turn Q&A Competitive Robust instruction-following across turns

AyurParam surpasses all open-source instruction-tuned models in the 1.5–3B parameter class and demonstrates competitive or superior performance compared to significantly larger models. For reference, GPT-4o achieves only 59.74% overall accuracy in the Ayurveda domain of BhashaBench — illustrating the difficulty of the task even for frontier models.

Why MCQ Performance Matters

Strong MCQ accuracy reflects the model's ability to discriminate between closely related Ayurvedic concepts and therapeutic approaches — a critical skill for educational assessment, practitioner certification preparation, and clinical decision support tooling.


Comparison with Prior Work

Model Parameters Ayurveda Domain Focus Bilingual (EN+HI) Benchmarked on BBA
AyurGPT Various Partial Partial No
IRGPT Various Partial Partial No
GPT-4o ~1T+ General Yes Yes (59.74% overall)
AyurParam-2.9B 2.9B Full Yes Yes (SOTA in class)

AyurParam is the first model to combine: (1) Ayurveda-specific pretraining at scale, (2) rigorous bilingual instruction tuning, and (3) systematic evaluation on a dedicated Ayurvedic benchmark.


Intended Use Cases

  • Ayurvedic education — Explanation of classical concepts, text interpretation, self-study Q&A
  • Research assistance — Literature review, classical text analysis, cross-referencing compendia
  • Clinical knowledge support — Reference tool for practitioners (NOT a clinical decision system)
  • Content generation — Bilingual wellness content, educational materials, FAQ generation
  • Benchmarking — Baseline for future Ayurvedic AI research

Quickstart

Available Quantized Variants

Variant File Size Use Case
Q4_K_M ayurparam-q4_k_m.gguf 1.82 GB Recommended for CPU inference; best quality/size trade-off
Q8_0 AyurParam-2.9b-it-q8_0.gguf 3.05 GB Higher fidelity; good for GPU or high-RAM CPU setups
FP16 AyurParam-2.9b-it-fp16.gguf 5.73 GB Full FP16 precision; maximum fidelity for GPU inference

Option 1 — llama.cpp (CPU / GPU)

Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
# Verify build
./main --help

Download the model

git lfs install
git clone https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf
# Available files:
#   ayurparam-q4_k_m.gguf          (1.82 GB — CPU recommended)
#   AyurParam-2.9b-it-q8_0.gguf   (3.05 GB — higher quality)
#   AyurParam-2.9b-it-fp16.gguf   (5.73 GB — GPU / max fidelity)

Run a single prompt

./main \
  -m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  -p "Explain the Ayurvedic concept of the three doshas — Vata, Pitta, and Kapha — and their role in maintaining health." \
  -n 256 \
  --temp 0.7

Interactive chat mode

./main \
  -m path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  --interactive \
  -ins \
  --temp 0.7

Start a local OpenAI-compatible server

./llama-server \
  --model path/to/AyurParam-2.9b-it-q4_k_m.gguf \
  --port 8080 \
  --threads 8 \
  --ctx-size 2048

Query the server

curl -s -X POST "http://localhost:8080/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What are the dietary recommendations for a Pitta-dominant constitution?"}
    ],
    "max_tokens": 300,
    "temperature": 0.7
  }'

Option 2 — Ollama (Recommended for beginners)

Ollama provides a simple model management interface and an OpenAI-compatible local API.

Install Ollama — follow ollama.ai for your OS.

Pull and run the model

ollama run Prady029/AyurParam-2.9b-it-gguf

Query via API

curl -s -X POST "http://localhost:11434/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "AyurParam-2.9b-it-gguf",
    "prompt": "List three lifestyle practices from Dinacharya (Ayurvedic daily routine) that support Kapha balance.",
    "max_tokens": 200,
    "temperature": 0.7
  }'

Option 3 — Python with llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./AyurParam-2.9b-it-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8,
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are AyurParam, a knowledgeable Ayurvedic assistant. Provide accurate, culturally grounded responses based on classical Ayurvedic texts."
        },
        {
            "role": "user",
            "content": "Explain samprapti (pathogenesis) in the context of a Vata imbalance."
        }
    ],
    max_tokens=512,
    temperature=0.7,
)

print(response["choices"][0]["message"]["content"])

Prompt Format

AyurParam uses a custom bilingual instruction template. For best results, structure prompts as follows:

English

<system_prompt>You are AyurParam, an expert Ayurvedic assistant with deep knowledge of classical texts and clinical Ayurveda.</system_prompt>
<user>What is the Ayurvedic understanding of Agni (digestive fire) and its types?</user>
<assistant>

Hindi

<system_prompt>आप AyurParam हैं, एक विशेषज्ञ आयुर्वेदिक सहायक जो शास्त्रीय ग्रंथों और नैदानिक आयुर्वेद का गहन ज्ञान रखते हैं।</system_prompt>
<user>आयुर्वेद में त्रिदोष सिद्धांत क्या है?</user>
<assistant>

Limitations

⚠️ Medical Disclaimer: AyurParam is an informational and educational tool only. Outputs must not be used for clinical diagnosis, treatment decisions, or emergency medical guidance. Always consult a qualified Ayurvedic practitioner or licensed medical professional.

Limitation Description
Not a medical device Outputs are informational; not validated for clinical use
No safety guardrails Lacks explicit mechanisms to prevent generation of harmful medical advice
Hallucinations Can produce plausible but factually incorrect claims; verify with authoritative sources
No personalization Does not account for individual patient histories or contraindications
Domain bias Trained primarily on Ayurvedic and related corpora; may over-generalize
Language coverage Optimized for English and Hindi; other languages not guaranteed
Data licensing Training corpus limited to open-access repositories; licensed clinical databases not included
Quantization effects Q4_K_M and Q8_0 reduce memory/disk usage but may slightly degrade generation quality vs FP16

Practical Tips

  • Out of memory: Use ayurparam-q4_k_m.gguf (1.82 GB) or reduce context size (--ctx-size 1024)
  • Balanced quality/memory: Use AyurParam-2.9b-it-q8_0.gguf (3.05 GB) on machines with 6–8 GB RAM
  • Maximum fidelity: Use AyurParam-2.9b-it-fp16.gguf (5.73 GB) on a GPU with 8+ GB VRAM
  • Slow CPU inference: Increase thread count (--threads 8 or set OMP_NUM_THREADS=8)
  • Quality comparison: Compare outputs across Q4_K_M, Q8_0, and FP16 variants on representative Ayurvedic prompts to pick the right trade-off
  • Expert review: For any deployment, have domain experts review representative outputs before public release
  • Prompt clarity: More specific prompts (e.g., specifying dosha, text source, or clinical context) yield better results

Repository Structure

AyurParam-2.9b-it-gguf/
├── ayurparam-q4_k_m.gguf             # Q4_K_M quantized model (1.82 GB) — CPU recommended
├── AyurParam-2.9b-it-q8_0.gguf       # Q8_0 quantized model (3.05 GB) — higher fidelity
├── AyurParam-2.9b-it-fp16.gguf       # FP16 model (5.73 GB) — maximum precision, GPU
├── .gitattributes                     # LFS tracking config
└── README.md                          # This file

Citation

If you use AyurParam in your research or application, please cite the original paper and this model:

@misc{nauman2025ayurparamstateoftheartbilinguallanguage,
  title        = {AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda},
  author       = {Mohd Nauman and Sravan Gvm and Vijay Devane and Shyam Pawar and
                  Viraj Thakur and Kundeshwar Pundalik and Piyush Sawarkar and
                  Rohit Saluja and Maunendra Desarkar and Ganesh Ramakrishnan},
  year         = {2025},
  eprint       = {2511.02374},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2511.02374}
}
@misc{ayurparam_gguf_2025,
  title  = {AyurParam-2.9b-it-gguf},
  author = {Pradyumna Kumar Sahoo},
  year   = {2025},
  url    = {https://huggingface.co/Prady029/AyurParam-2.9b-it-gguf},
  note   = {Contact: prady029@duck.com}
}

Credits

This GGUF release was prepared and published by:

Name Email
Pradyumna Kumar Sahoo prady029@duck.com

For questions about the original AyurParam research, reach the BharatGen team:

Contact Email
Sravan Kumar sravan.kumar@tihiitb.org
Kundeshwar Pundalik kundeshwar.pundalik@tihiitb.org
Mohd Nauman mohd.nauman@tihiitb.org

Acknowledgements

AyurParam was developed at the Technology Innovation Hub (TIH), IIT Bombay as part of the BharatGen initiative — advancing AI for Indic languages and knowledge systems. The base model Param-1-2.9B-Instruct was developed by bharatgenai. Benchmark data was curated under the BhashaBench V1 framework (bhashavbenchv1).


AyurParam — Bridging five thousand years of Ayurvedic wisdom with modern AI.

Downloads last month
39
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Prady029/AyurParam-2.9b-it-gguf

Quantized
(2)
this model

Paper for Prady029/AyurParam-2.9b-it-gguf