You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Sarvam-30B Round 1 Text-to-Text Research Submission

Submission Context

This repository documents a Round 1 technical-trial submission for the Resilient AI Challenge 2026 Text-to-Text category.

The submitted category model is Sarvam-30B.

Round 1 is treated as a technical validation and comparison stage. The purpose of this repository is to document model packaging, local serving feasibility, repository completeness, inference-control behaviour, limitations, and the research path toward Round 2.

Earlier lighter-model work is included only as supporting methodological evidence. It is not submitted as the category model.


Research Overview and Methodological Summary

1. Research Objective

This repository evaluates whether Sarvam-30B can be packaged, served, controlled, and documented in a reproducible Hugging Face model repository suitable for examiner review in the Resilient AI Challenge 2026 Text-to-Text category.

The research objective is technical rather than promotional. The work focuses on:

  • verifying Sarvam-30B model identity and repository structure,
  • validating local vLLM serving on controlled hardware,
  • identifying prompt-template and endpoint behaviour,
  • developing a response-control wrapper for concise final-answer tasks,
  • documenting limitations without overstating compression or optimisation claims,
  • preserving a clear research direction for Round 2 evaluation and hardening.

Sarvam-30B is the submitted category model and unmodified. A lighter-model work is retained only as supporting evidence for scaffold design, fixed-budget evaluation, response-control testing, and reproducibility.

2. Experimental Hardware and Runtime Environment

All local experiments and validation runs were conducted on a controlled workstation/server environment to make runtime behaviour, memory constraints, and serving feasibility observable under consistent conditions:

GPU: 2 × NVIDIA GeForce RTX 3090
System RAM: 192 GB
OS: Ubuntu Server 24.04
Serving engine: vLLM 0.19.1
Python environment: project virtual environment
Model root: Sarvam-30B Hugging Face-style repository

---

## Base Model

Base model: **Sarvam-30B by Sarvam AI**

Model class:

```text
SarvamMoEForCausalLM

Model type:

sarvam_moe

This submission uses the Sarvam-30B model path and does not replace it with the lighter model used during earlier pipeline development.


Licence

This repository uses the same licence as the original Sarvam-30B model:

apache-2.0

Round 1 Position

Round 1 status:

TECHNICAL_TRIAL_READY

Current runtime classification:

LOADABLE / VLLM-SERVEABLE / ROUTER-TRACEABLE / EXPERT-MAPPED / GENERATION-CONTROLLABLE WITH WRAPPER

This means:

  • Sarvam-30B loads locally.
  • vLLM 0.19.1 recognises the Sarvam architecture.
  • The model can be served through vLLM.
  • The model runs locally on 2 × RTX 3090 with CPU offload.
  • Stable final-answer behaviour currently uses the included wrapper.

Inference Environment

The recommended serving engine is:

vLLM 0.19.1

Expected challenge serving command:

vllm serve --config vllm_config.yaml

The repository includes a root-level:

vllm_config.yaml
Downloads last month
-
Safetensors
Model size
32B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support