--- license: apache-2.0 tags: - simlingo - autonomous-driving - vision-language-action - carla - bench2drive - qwen3-vl - robotics datasets: - RenzKa/simlingo base_model: - Qwen/Qwen3-VL-2B-Instruct --- # SimLingo-QwenVL3-2B SimLingo-QwenVL3-2B is a vision-language-action model for closed-loop autonomous driving in CARLA. This release is based on the SimLingo codebase and uses a Qwen3-VL-2B-Instruct backbone inside the SimLingo driving stack. This repository provides two checkpoints: - **Epoch 13** (trained for 14 epochs) - **Epoch 14** (trained for 15 epochs) Each checkpoint is released in two formats: - `checkpoints/epoch=013.ckpt`: PyTorch Lightning checkpoint for training resumption. - `checkpoints/epoch=013.pt`: exported weights for SimLingo evaluation and inference. ## Model Overview - Model family: SimLingo - Backbone: Qwen3-VL-2B-Instruct - Modality: front-camera vision + route / control conditioning - Primary use case: closed-loop autonomous driving research in simulation - Codebase: https://github.com/RenzKa/simlingo - Paper: https://arxiv.org/abs/2503.09594 ## Evaluation The checkpoint `epoch=013.ckpt` was evaluated on Bench2Drive. | Benchmark | Checkpoint | Driving Score (DS) | Success Rate (%) | | --- | --- | ---: | ---: | | Bench2Drive | `epoch=013.ckpt` | 63.94 | 30.00 | Bench2Drive is a CARLA closed-loop driving benchmark with 220 short routes and one safety-critical scenario per route. The corresponding `.pt` file in this repo is exported from the same epoch and is intended for use with the SimLingo evaluation code. ## Usage This model is not intended to be loaded with vanilla Transformers alone. It depends on the SimLingo repository for preprocessing, control prediction, and CARLA agent execution. Typical closed-loop evaluation uses the exported `.pt` file together with the SimLingo codebase: ```bash python /path/to/simlingo/Bench2Drive/leaderboard/leaderboard/leaderboard_evaluator.py \ --agent /path/to/simlingo/team_code/agent_simlingo.py \ --agent-config /path/to/checkpoints/epoch=013.pt ``` For cluster-based evaluation, refer to `start_eval_simlingo.py` in the SimLingo repository. ## Training Context This checkpoint comes from the training run: - Run name: `2026_04_18_18_01_31_simlingo_qwen3` - Max epochs: 15 - Batch Size: 96 - Learning Rate: 3e-5 - Precision: `bf16-mixed` - Training seed: `9876` The associated Hydra config uses: - vision backbone variant: `Qwen3-VL-2B-Instruct` - language backbone variant: `Qwen3-VL-2B-Instruct` ## Intended Use This model is intended for: - research on closed-loop driving in CARLA - benchmarking on Bench2Drive - studying language-conditioned driving behavior in simulation This model is not intended for: - real-world driving deployment - safety-critical use outside simulation ## Limitations - Results are from simulation and do not imply real-world safety. - Performance depends on the exact SimLingo code, CARLA version, and benchmark setup. - Closed-loop metrics can vary if evaluation infrastructure, GPU mapping, or simulator stability differ across machines. ## Citation If you use this checkpoint, please cite SimLingo and Bench2Drive. ```bibtex @InProceedings{Renz2025cvpr, title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment}, author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg}, booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2025} } @inproceedings{Jia2024NeurIPS, title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving}, author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan}, booktitle={NeurIPS 2024 Datasets and Benchmarks Track}, year={2024} } ```