---
license: apache-2.0
tags:
- simlingo
- autonomous-driving
- vision-language-action
- carla
- bench2drive
- qwen3-vl
- robotics
datasets:
- RenzKa/simlingo
base_model:
- Qwen/Qwen3-VL-2B-Instruct
---

# SimLingo-QwenVL3-2B

SimLingo-QwenVL3-2B is a vision-language-action model for closed-loop autonomous driving in CARLA. This release is based on the SimLingo codebase and uses a Qwen3-VL-2B-Instruct backbone inside the SimLingo driving stack.

This repository provides two checkpoints:
- **Epoch 13** (trained for 14 epochs)
- **Epoch 14** (trained for 15 epochs)

Each checkpoint is released in two formats:
- `checkpoints/epoch=013.ckpt`: PyTorch Lightning checkpoint for training resumption.
- `checkpoints/epoch=013.pt`: exported weights for SimLingo evaluation and inference.

## Model Overview

- Model family: SimLingo
- Backbone: Qwen3-VL-2B-Instruct
- Modality: front-camera vision + route / control conditioning
- Primary use case: closed-loop autonomous driving research in simulation
- Codebase: https://github.com/RenzKa/simlingo
- Paper: https://arxiv.org/abs/2503.09594

## Evaluation

The checkpoint `epoch=013.ckpt` was evaluated on Bench2Drive.

| Benchmark | Checkpoint | Driving Score (DS) | Success Rate (%) |
| --- | --- | ---: | ---: |
| Bench2Drive | `epoch=013.ckpt` | 63.94 | 30.00 |

Bench2Drive is a CARLA closed-loop driving benchmark with 220 short routes and one safety-critical scenario per route. The corresponding `.pt` file in this repo is exported from the same epoch and is intended for use with the SimLingo evaluation code.

## Usage

This model is not intended to be loaded with vanilla Transformers alone. It depends on the SimLingo repository for preprocessing, control prediction, and CARLA agent execution.

Typical closed-loop evaluation uses the exported `.pt` file together with the SimLingo codebase:

```bash
python /path/to/simlingo/Bench2Drive/leaderboard/leaderboard/leaderboard_evaluator.py \
  --agent /path/to/simlingo/team_code/agent_simlingo.py \
  --agent-config /path/to/checkpoints/epoch=013.pt
```

For cluster-based evaluation, refer to `start_eval_simlingo.py` in the SimLingo repository.

## Training Context

This checkpoint comes from the training run:

- Run name: `2026_04_18_18_01_31_simlingo_qwen3`
- Max epochs: 15
- Batch Size: 96
- Learning Rate: 3e-5
- Precision: `bf16-mixed`
- Training seed: `9876`

The associated Hydra config uses:

- vision backbone variant: `Qwen3-VL-2B-Instruct`
- language backbone variant: `Qwen3-VL-2B-Instruct`

## Intended Use

This model is intended for:

- research on closed-loop driving in CARLA
- benchmarking on Bench2Drive
- studying language-conditioned driving behavior in simulation

This model is not intended for:

- real-world driving deployment
- safety-critical use outside simulation

## Limitations

- Results are from simulation and do not imply real-world safety.
- Performance depends on the exact SimLingo code, CARLA version, and benchmark setup.
- Closed-loop metrics can vary if evaluation infrastructure, GPU mapping, or simulator stability differ across machines.

## Citation

If you use this checkpoint, please cite SimLingo and Bench2Drive.

```bibtex
@InProceedings{Renz2025cvpr,
  title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment},
  author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

@inproceedings{Jia2024NeurIPS,
  title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
  author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
  booktitle={NeurIPS 2024 Datasets and Benchmarks Track},
  year={2024}
}
```