Instructions to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sustainaibler/sarvam-30b-resi-ai-t2t-rd1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("sustainaibler/sarvam-30b-resi-ai-t2t-rd1", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sustainaibler/sarvam-30b-resi-ai-t2t-rd1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sustainaibler/sarvam-30b-resi-ai-t2t-rd1

SGLang

How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sustainaibler/sarvam-30b-resi-ai-t2t-rd1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sustainaibler/sarvam-30b-resi-ai-t2t-rd1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with Docker Model Runner:
```
docker model run hf.co/sustainaibler/sarvam-30b-resi-ai-t2t-rd1
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Sarvam-30B Round 1 Text-to-Text Research Submission

Submission Context

This repository documents a Round 1 technical-trial submission for the Resilient AI Challenge 2026 Text-to-Text category.

The submitted category model is Sarvam-30B.

Round 1 is treated as a technical validation and comparison stage. The purpose of this repository is to document model packaging, local serving feasibility, repository completeness, inference-control behaviour, limitations, and the research path toward Round 2.

Earlier lighter-model work is included only as supporting methodological evidence. It is not submitted as the category model.

Research Overview and Methodological Summary

1. Research Objective

This repository evaluates whether Sarvam-30B can be packaged, served, controlled, and documented in a reproducible Hugging Face model repository suitable for examiner review in the Resilient AI Challenge 2026 Text-to-Text category.

The research objective is technical rather than promotional. The work focuses on:

verifying Sarvam-30B model identity and repository structure,
validating local vLLM serving on controlled hardware,
identifying prompt-template and endpoint behaviour,
developing a response-control wrapper for concise final-answer tasks,
documenting limitations without overstating compression or optimisation claims,
preserving a clear research direction for Round 2 evaluation and hardening.

Sarvam-30B is the submitted category model and unmodified. A lighter-model work is retained only as supporting evidence for scaffold design, fixed-budget evaluation, response-control testing, and reproducibility.

2. Experimental Hardware and Runtime Environment

All local experiments and validation runs were conducted on a controlled workstation/server environment to make runtime behaviour, memory constraints, and serving feasibility observable under consistent conditions:

GPU: 2 × NVIDIA GeForce RTX 3090
System RAM: 192 GB
OS: Ubuntu Server 24.04
Serving engine: vLLM 0.19.1
Python environment: project virtual environment
Model root: Sarvam-30B Hugging Face-style repository

---

## Base Model

Base model: **Sarvam-30B by Sarvam AI**

Model class:

```text
SarvamMoEForCausalLM

Model type:

sarvam_moe

This submission uses the Sarvam-30B model path and does not replace it with the lighter model used during earlier pipeline development.

Licence

This repository uses the same licence as the original Sarvam-30B model:

apache-2.0

Round 1 Position

Round 1 status:

TECHNICAL_TRIAL_READY

Current runtime classification:

LOADABLE / VLLM-SERVEABLE / ROUTER-TRACEABLE / EXPERT-MAPPED / GENERATION-CONTROLLABLE WITH WRAPPER

This means:

Sarvam-30B loads locally.
vLLM 0.19.1 recognises the Sarvam architecture.
The model can be served through vLLM.
The model runs locally on 2 × RTX 3090 with CPU offload.
Stable final-answer behaviour currently uses the included wrapper.

Inference Environment

The recommended serving engine is:

vLLM 0.19.1

Expected challenge serving command:

vllm serve --config vllm_config.yaml

The repository includes a root-level:

vllm_config.yaml

Downloads last month: -

Safetensors

Model size

32B params

Tensor type

F32