Instructions to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sustainaibler/sarvam-30b-resi-ai-t2t-rd1", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("sustainaibler/sarvam-30b-resi-ai-t2t-rd1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sustainaibler/sarvam-30b-resi-ai-t2t-rd1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sustainaibler/sarvam-30b-resi-ai-t2t-rd1
- SGLang
How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sustainaibler/sarvam-30b-resi-ai-t2t-rd1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sustainaibler/sarvam-30b-resi-ai-t2t-rd1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sustainaibler/sarvam-30b-resi-ai-t2t-rd1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use sustainaibler/sarvam-30b-resi-ai-t2t-rd1 with Docker Model Runner:
docker model run hf.co/sustainaibler/sarvam-30b-resi-ai-t2t-rd1
Sarvam-30B Round 1 Text-to-Text Research Submission
Submission Context
This repository documents a Round 1 technical-trial submission for the Resilient AI Challenge 2026 Text-to-Text category.
The submitted category model is Sarvam-30B.
Round 1 is treated as a technical validation and comparison stage. The purpose of this repository is to document model packaging, local serving feasibility, repository completeness, inference-control behaviour, limitations, and the research path toward Round 2.
Earlier lighter-model work is included only as supporting methodological evidence. It is not submitted as the category model.
Research Overview and Methodological Summary
1. Research Objective
This repository evaluates whether Sarvam-30B can be packaged, served, controlled, and documented in a reproducible Hugging Face model repository suitable for examiner review in the Resilient AI Challenge 2026 Text-to-Text category.
The research objective is technical rather than promotional. The work focuses on:
- verifying Sarvam-30B model identity and repository structure,
- validating local vLLM serving on controlled hardware,
- identifying prompt-template and endpoint behaviour,
- developing a response-control wrapper for concise final-answer tasks,
- documenting limitations without overstating compression or optimisation claims,
- preserving a clear research direction for Round 2 evaluation and hardening.
Sarvam-30B is the submitted category model and unmodified. A lighter-model work is retained only as supporting evidence for scaffold design, fixed-budget evaluation, response-control testing, and reproducibility.
2. Experimental Hardware and Runtime Environment
All local experiments and validation runs were conducted on a controlled workstation/server environment to make runtime behaviour, memory constraints, and serving feasibility observable under consistent conditions:
GPU: 2 × NVIDIA GeForce RTX 3090
System RAM: 192 GB
OS: Ubuntu Server 24.04
Serving engine: vLLM 0.19.1
Python environment: project virtual environment
Model root: Sarvam-30B Hugging Face-style repository
---
## Base Model
Base model: **Sarvam-30B by Sarvam AI**
Model class:
```text
SarvamMoEForCausalLM
Model type:
sarvam_moe
This submission uses the Sarvam-30B model path and does not replace it with the lighter model used during earlier pipeline development.
Licence
This repository uses the same licence as the original Sarvam-30B model:
apache-2.0
Round 1 Position
Round 1 status:
TECHNICAL_TRIAL_READY
Current runtime classification:
LOADABLE / VLLM-SERVEABLE / ROUTER-TRACEABLE / EXPERT-MAPPED / GENERATION-CONTROLLABLE WITH WRAPPER
This means:
- Sarvam-30B loads locally.
- vLLM 0.19.1 recognises the Sarvam architecture.
- The model can be served through vLLM.
- The model runs locally on 2 × RTX 3090 with CPU offload.
- Stable final-answer behaviour currently uses the included wrapper.
Inference Environment
The recommended serving engine is:
vLLM 0.19.1
Expected challenge serving command:
vllm serve --config vllm_config.yaml
The repository includes a root-level:
vllm_config.yaml
- Downloads last month
- -