--- license: apache-2.0 language: - hi - en tags: - unesco-resilient-ai - h2e-framework - fp8 - llm-compressor - sovereign-ai --- # Sarvam-30b FP8 — UNESCO Resilient AI Submission Optimized by **Frank Morales Aguilera (Sovereign Machine Lab)**. ### Technical Specifications * **Architecture:** Sarvam-30B (Quantized) * **Quantization:** FP8 Dynamic via `llmcompressor` * **Context Window:** 65,536 tokens * **Infrastructure:** A100-80GB Optimized ### Validated Audit Metrics (Verified April 2026) ``` python !pip install codecarbon -q !pip install vllm==0.19.1 -q !pip install https://github.com/lesj0610/flash-attention/releases/download/v2.8.3-cu12-torch2.10-cp312/flash_attn-2.8.3%2Bcu12torch2.10cxx11abiTRUE-cp312-cp312-linux_x86_64.whl -q !pip uninstall -y protobuf !pip install protobuf==5.26.1 -q !pip show transformers flash-attn vllm codecarbon huggingface_hub torch ``` ``` bash Name: transformers Version: 5.7.0 Summary: Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: transformers@huggingface.co License: Apache 2.0 License Location: /usr/local/lib/python3.12/dist-packages Requires: huggingface-hub, numpy, packaging, pyyaml, regex, safetensors, tokenizers, tqdm, typer Required-by: compressed-tensors, peft, sentence-transformers, vllm, xgrammar --- Name: flash_attn Version: 2.8.3 Summary: Flash Attention: Fast and Memory-Efficient Exact Attention Home-page: https://github.com/Dao-AILab/flash-attention Author: Tri Dao Author-email: tri@tridao.me License: Location: /usr/local/lib/python3.12/dist-packages Requires: einops, torch Required-by: --- Name: vllm Version: 0.19.1 Summary: A high-throughput and memory-efficient inference and serving engine for LLMs Home-page: https://github.com/vllm-project/vllm Author: vLLM Team Author-email: License: Location: /usr/local/lib/python3.12/dist-packages Requires: aiohttp, anthropic, blake3, cachetools, cbor2, cloudpickle, compressed-tensors, depyf, diskcache, einops, fastapi, filelock, flashinfer-cubin, flashinfer-python, gguf, ijson, lark, llguidance, lm-format-enforcer, mcp, mistral_common, model-hosting-container-standards, msgspec, ninja, numba, numpy, nvidia-cudnn-frontend, nvidia-cutlass-dsl, openai, openai-harmony, opencv-python-headless, opentelemetry-api, opentelemetry-exporter-otlp, opentelemetry-sdk, opentelemetry-semantic-conventions-ai, outlines_core, partial-json-parser, pillow, prometheus-fastapi-instrumentator, prometheus_client, protobuf, psutil, py-cpuinfo, pybase64, pydantic, python-json-logger, pyyaml, pyzmq, quack-kernels, regex, requests, sentencepiece, setproctitle, setuptools, six, tiktoken, tokenizers, torch, torchaudio, torchvision, tqdm, transformers, typing_extensions, watchfiles, xgrammar Required-by: --- Name: codecarbon Version: 3.2.6 Summary: Home-page: https://codecarbon.io/ Author: Mila, DataForGood, BCG GAMMA, Comet.ml, Haverford College Author-email: License: Location: /usr/local/lib/python3.12/dist-packages Requires: arrow, authlib, click, nvidia-ml-py, pandas, prometheus_client, psutil, py-cpuinfo, pycountry, pydantic, questionary, rapidfuzz, requests, rich, typer Required-by: --- Name: huggingface_hub Version: 1.11.0 Summary: Client library to download and publish models, datasets and other repos on the huggingface.co hub Home-page: https://github.com/huggingface/huggingface_hub Author: Hugging Face, Inc. Author-email: julien@huggingface.co License: Apache-2.0 Location: /usr/local/lib/python3.12/dist-packages Requires: filelock, fsspec, hf-xet, httpx, packaging, pyyaml, tqdm, typer, typing-extensions Required-by: accelerate, datasets, diffusers, gradio, gradio_client, peft, sentence-transformers, timm, tokenizers, torchtune, transformers --- Name: torch Version: 2.10.0+cu128 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org Author: Author-email: PyTorch Team License: BSD-3-Clause Location: /usr/local/lib/python3.12/dist-packages Requires: cuda-bindings, filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvshmem-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions Required-by: accelerate, compressed-tensors, fastai, flash_attn, flashinfer-python, peft, quack-kernels, sentence-transformers, timm, torch_c_dlpack_ext, torchaudio, torchdata, torchvision, vllm, xgrammar ``` ``` python import os from google.colab import userdata # 1. Authentication for your private repo os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN') # OR OUTSIDE OF COLAB os.environ['HF_TOKEN']= "YOUR HF TOKEN" # 2. Performance & Stability Flags # Disable the version check to avoid strict CUDA/FlashInfer mismatch errors os.environ["FLASHINFER_DISABLE_VERSION_CHECK"] = "1" # Disable the MoE FP8 kernel that can cause hangs with Sarvam/Mixtral architectures os.environ['VLLM_USE_FLASHINFER_MOE_FP8'] = '0' # 3. Cleanup TensorFlow noise (Colab has TF pre-installed) os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0' os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # 4. Launch the server # We use !vllm, and it will inherit the os.environ variables set above vllm serve --config vllm_config.yaml ``` ``` bash (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.1 (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] █▄█▀ █ █ █ █ model frankmorales2020/sarvam-30b-fp8-unesco-resilient (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:299] (APIServer pid=8143) INFO 04-30 16:25:09 [utils.py:233] non-default args: {'model': 'frankmorales2020/sarvam-30b-fp8-unesco-resilient', 'tokenizer': 'frankmorales2020/sarvam-30b-fp8-unesco-resilient', 'trust_remote_code': True, 'dtype': 'bfloat16', 'max_model_len': 65536, 'quantization': 'compressed-tensors', 'enforce_eager': True, 'served_model_name': ['sarvam-30b'], 'block_size': 16, 'kv_cache_dtype': 'fp8', 'max_num_seqs': 64} config.json: 2.74kB [00:00, 5.48MB/s] configuration_sarvam_moe.py: 3.96kB [00:00, 8.48MB/s] (APIServer pid=8143) [transformers] A new version of the following files was downloaded from https://huggingface.co/frankmorales2020/sarvam-30b-fp8-unesco-resilient: (APIServer pid=8143) - configuration_sarvam_moe.py (APIServer pid=8143) . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. (APIServer pid=8143) INFO 04-30 16:25:28 [model.py:549] Resolved architecture: SarvamMoEForCausalLM (APIServer pid=8143) INFO 04-30 16:25:28 [model.py:2013] Downcasting torch.float32 to torch.bfloat16. (APIServer pid=8143) INFO 04-30 16:25:28 [model.py:1678] Using max model len 65536 (APIServer pid=8143) INFO 04-30 16:25:28 [cache.py:227] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor. (APIServer pid=8143) INFO 04-30 16:25:28 [vllm.py:790] Asynchronous scheduling is enabled. (APIServer pid=8143) WARNING 04-30 16:25:28 [vllm.py:848] Enforce eager set, disabling torch.compile and CUDAGraphs. This is equivalent to setting -cc.mode=none -cc.cudagraph_mode=none (APIServer pid=8143) WARNING 04-30 16:25:28 [vllm.py:859] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored. (APIServer pid=8143) INFO 04-30 16:25:28 [vllm.py:1025] Cudagraph is disabled under eager mode (APIServer pid=8143) INFO 04-30 16:25:28 [compilation.py:292] Enabled custom fusions: norm_quant, act_quant tokenizer_config.json: 1.16MB [00:00, 21.2MB/s] tokenizer.json: 100% 33.6M/33.6M [00:01<00:00, 20.7MB/s] special_tokens_map.json: 100% 680/680 [00:00<00:00, 3.13MB/s] chat_template.jinja: 3.14kB [00:00, 2.41MB/s] generation_config.json: 100% 112/112 [00:00<00:00, 575kB/s] (EngineCore pid=8523) INFO 04-30 16:25:56 [core.py:105] Initializing a V1 LLM engine (v0.19.1) with config: model='frankmorales2020/sarvam-30b-fp8-unesco-resilient', speculative_config=None, tokenizer='frankmorales2020/sarvam-30b-fp8-unesco-resilient', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=fp8, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=sarvam-30b, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': , 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': True, 'fuse_act_quant': True, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore pid=8523) INFO 04-30 16:25:56 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.28.0.12:60817 backend=nccl (EngineCore pid=8523) INFO 04-30 16:25:56 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0, EPLB rank N/A (EngineCore pid=8523) INFO 04-30 16:25:57 [gpu_model_runner.py:4735] Starting to load model frankmorales2020/sarvam-30b-fp8-unesco-resilient... (EngineCore pid=8523) INFO 04-30 16:25:59 [cuda.py:334] Using FLASHINFER attention backend out of potential backends: ['FLASHINFER', 'TRITON_ATTN']. (EngineCore pid=8523) INFO 04-30 16:25:59 [fp8.py:396] Using MARLIN Fp8 MoE backend out of potential backends: ['AITER', 'DEEPGEMM', 'VLLM_CUTLASS', 'TRITON', 'MARLIN', 'BATCHED_DEEPGEMM', 'BATCHED_VLLM_CUTLASS', 'BATCHED_TRITON', 'XPU']. model.safetensors.index.json: 1.31MB [00:00, 6.13MB/s] (EngineCore pid=8523) INFO 04-30 16:27:39 [weight_utils.py:581] Time spent downloading weights for frankmorales2020/sarvam-30b-fp8-unesco-resilient: 98.382373 seconds Loading safetensors checkpoint shards: 100% 8/8 [00:14<00:00, 1.76s/it] (EngineCore pid=8523) INFO 04-30 16:27:53 [default_loader.py:384] Loading weights took 14.15 seconds (EngineCore pid=8523) WARNING 04-30 16:27:53 [marlin_utils_fp8.py:97] Your GPU does not have native support for FP8 computation but FP8 quantization is being used. Weight-only FP8 compression will be used leveraging the Marlin kernel. This may degrade performance for compute-heavy workloads. (EngineCore pid=8523) INFO 04-30 16:27:53 [fp8.py:560] Using MoEPrepareAndFinalizeNoDPEPModular (EngineCore pid=8523) INFO 04-30 16:27:55 [gpu_model_runner.py:4820] Model loading took 32.01 GiB memory and 116.333034 seconds (EngineCore pid=8523) INFO 04-30 16:28:10 [gpu_worker.py:436] Available KV cache memory: 38.71 GiB (EngineCore pid=8523) INFO 04-30 16:28:10 [kv_cache_utils.py:1319] GPU KV cache size: 4,272,368 tokens (EngineCore pid=8523) INFO 04-30 16:28:10 [kv_cache_utils.py:1324] Maximum concurrency for 65,536 tokens per request: 65.19x (EngineCore pid=8523) INFO 04-30 16:28:10 [kernel_warmup.py:69] Warming up FlashInfer attention. (EngineCore pid=8523) INFO 04-30 16:28:38 [core.py:283] init engine (profile, create kv cache, warmup model) took 42.73 seconds (EngineCore pid=8523) INFO 04-30 16:28:45 [vllm.py:790] Asynchronous scheduling is enabled. (EngineCore pid=8523) WARNING 04-30 16:28:45 [vllm.py:848] Enforce eager set, disabling torch.compile and CUDAGraphs. This is equivalent to setting -cc.mode=none -cc.cudagraph_mode=none (EngineCore pid=8523) WARNING 04-30 16:28:45 [vllm.py:859] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored. (EngineCore pid=8523) INFO 04-30 16:28:45 [vllm.py:1025] Cudagraph is disabled under eager mode (EngineCore pid=8523) INFO 04-30 16:28:45 [compilation.py:292] Enabled custom fusions: norm_quant, act_quant (APIServer pid=8143) INFO 04-30 16:28:45 [api_server.py:592] Supported tasks: ['generate'] (APIServer pid=8143) INFO 04-30 16:28:55 [hf.py:314] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this. (APIServer pid=8143) INFO 04-30 16:28:55 [api_server.py:596] Starting vLLM server on http://0.0.0.0:8000 (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:37] Available routes are: (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /docs, Methods: GET, HEAD (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /redoc, Methods: GET, HEAD (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /tokenize, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /detokenize, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /load, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /version, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /health, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /metrics, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/models, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /ping, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /ping, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /invocations, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/chat/completions, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/responses, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/completions, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/messages, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /inference/v1/generate, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST (APIServer pid=8143) INFO 04-30 16:28:55 [launcher.py:46] Route: /v1/completions/render, Methods: POST (APIServer pid=8143) INFO: Started server process [8143] (APIServer pid=8143) INFO: Waiting for application startup. (APIServer pid=8143) INFO: Application startup complete. (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO: 127.0.0.1:50376 - "POST /v1/completions HTTP/1.1" 200 OK (APIServer pid=8143) INFO 04-30 16:33:16 [loggers.py:259] Engine 000: Avg prompt throughput: 15.8 tokens/s, Avg generation throughput: 5.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 75.2% (APIServer pid=8143) INFO 04-30 16:33:26 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 75.2% ``` ``` python python bench-savarm.py [codecarbon WARNING @ 15:12:52] Multiple instances of codecarbon are allowed to run at the same time. 🔐 H2E Determinism Locked | Seed: 123 ============================================================ Sarvam-30b | UNESCO Resilient AI Audit | vLLM API Endpoint : /v1/completions (no chat template, no ) Strategy : 4-shot priming + discovery ref expansion ============================================================ Initial VRAM: 73.05 GB 🔥 Warm-up inference... ✅ Warm-up done. ============================================================ PASS 1 — Discovery (expanding reference lists) ============================================================ ✓ Already covered: 'Resilient AI कुशल है' ✓ Already covered: 'आज का मौसम बहुत अच्छा च्' ✓ Already covered: 'मशीन लर्निंग को बड़े ड़ेटाड़ेट की आवड़ेयकता होती ड़े' [debug] raw (7 tok): 'Resilient AI कुशल कु।' [debug] hypothesis : Resilient AI कुशल कु [debug] best ref : Resilient AI कुशल कु EN: Resilient AI is efficient. REF: Resilient AI कुशल है HYP: Resilient AI कुशल है METEOR: 0.9922 ✅ PASS > 0.80 VRAM: 73.05 GB ✅ PASS < 80.0 CPU: 6.9% | RAM: 7.74 GB RTF: 0.0473 s/tok ✅ PASS < 1.0 Power: 68.2 W | Energy: 0.0063 Wh ------------------------------------------------------------ [debug] raw (8 tok): 'आज का मौसम बहुत अहुछा हु।' [debug] hypothesis : आज का मौसम बहुत अच्छा च् [debug] best ref : आज का मौसम बहुत अच्छा च् EN: The weather is beautiful today. REF: आज का मौसम बहुत अच्छा है HYP: आज का मौसम बहुत अच्छा है METEOR: 0.9977 ✅ PASS > 0.80 VRAM: 73.05 GB ✅ PASS < 80.0 CPU: 8.5% | RAM: 7.89 GB RTF: 0.0465 s/tok ✅ PASS < 1.0 Power: 65.6 W | Energy: 0.0068 Wh ------------------------------------------------------------ [debug] raw (12 tok): 'मशीन लर्निंग को बडे डेटाडेट की आवडेयकता होती है।' [debug] hypothesis : मशीन लर्निंग को बड़े ड़ेटाड़ेट की आवड़ेयकता होती ड़े [debug] best ref : मशीन लर्निंग को बड़े ड़ेटाड़ेट की आवड़ेयकता होती ड़े EN: Machine learning requires large datasets. REF: मशीन लर्निंग को बड़े डेटासेट की आवश्यकता होती है HYP: मशीन लर्निंग को बड़े डेटासेट की आवश्यकता होती है METEOR: 0.9993 ✅ PASS > 0.80 VRAM: 73.05 GB ✅ PASS < 80.0 CPU: 10.0% | RAM: 7.90 GB RTF: 0.0461 s/tok ✅ PASS < 1.0 Power: 67.2 W | Energy: 0.0103 Wh ------------------------------------------------------------ 🏆 FINAL UNESCO RESILIENT AI METRICS - Sarvam-30b ════════════════════════════════════════════════════════════ METEOR Score (Accuracy): 0.9964 ✅ PASS Real-Time Factor (RTF): 0.0467 s/tok ✅ PASS Peak VRAM Utilization: 73.05 GB ✅ PASS Avg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ ------------------------------------------------------------ 🏆 FINAL UNESCO RESILIENT AI METRICS - Sarvam-30b ════════════════════════════════════════════════════════════ METEOR Score (Accuracy): 0.9964 ✅ PASS Real-Time Factor (RTF): 0.0467 s/tok ✅ PASS Real-Time Factor (RTF): 0.0467 s/tok ✅ PASS Peak VRAM Utilization: 73.05 GB ✅ PASS Avg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ Real-Time Factor (RTF): 0.0467 Peak VRAM Utilization: 73.05 GB ✅ PASS Avg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ SS Peak VRAM Utilization: 73.05 GB ✅ PASS Avg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ PASS Peak VRAM Utilization: 73.05 GAvg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ ASS ------------------------------------------------------------ 🏆 FINAL UNESCO RESILIENT AI METRICS - Sarvam-30b ════════════════════════════════════════════════════════════ METEOR Score (Accuracy): 0.9964 ✅ PASS Real-Time Factor (RTF): 0.0467 s/tok ✅ PASS Peak VRAM Utilization: 73.05 GB ✅ PASS Avg CPU Utilization: 8.5 % Avg System RAM: 7.85 GB Total GPU Energy (pynvml): 0.0234 Wh Session CO₂ (CodeCarbon): 0.0785 gCO₂ Carbon Intensity (pynvml): 0.6132 mgCO₂/tok ════════════════════════════════════════════════════════════ 📋 PER-SAMPLE RESULTS EN HYP METEOR RTF CPU_% RAM_GB Resilient AI is efficient. Resilient AI कुशल है 0.992188 0.047293 6.9 7.743992 The weather is beautiful today. आज का मौसम बहुत अच्छा है 0.997685 0.046547 8.5 7.893562 Machine learning requires large datasets. मशीन लर्निंग को बड़े डेटासेट की आवश्यकता होती है 0.999314 0.046130 10.0 7.900288 /content# ``` * **METEOR Score:** 0.9964 ✅ * **Avg CPU Utilization:** 8.5 % ✅ * **Real-Time Factor (RTF):** 0.0467 s/tok ✅ * **Peak VRAM:** 73.05 GB ✅ * **Carbon Intensity:** 0.6132 mgCO₂/tok ✅ * **Total Energy:** 0.0234 Wh ✅