--- language: - ru license: apache-2.0 base_model: - t-tech/T-pro-it-2.0 --- # T-pro-it-2.0-AWQ > **Main BF16 model:** [t-tech/T-pro-it-2.0](https://huggingface.co/t-tech/T-pro-it-2.0) **🚨 Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.** T‑pro‑it‑2.0‑AWQ is a fine‑grained AWQ‑quantised version of **T‑pro‑it‑2.0** (built on the Qwen‑3 family). AWQ 4-bit (W4A16_ASYM) offers comparable performance with approximately one-quarter the memory footprint and significantly faster inference. ## Description T-pro-it-2.0 is a model built upon the Qwen 3 model family and incorporates both continual pre-training and alignment techniques. ### 📚 Dataset Instruction Pre-Training: 40B tokens of instruction data, with one-third focused on reasoning tasks. Supervised Fine-Tuning (SFT): ~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset. Preference Tuning: ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks. ## 📊 Benchmarks TBD ## Note on AWQ For convenience and performance, we have provided `awq`-quantized model checkpoint for T-pro-it-2.0, whose name ends with `-AWQ`. You can find more details in the `quantization_config` field in `config.json`. However, please keep in mind the following known limitation: **There may be issues when running inference with transformers or sglang. We currently only guarantee stable performance with vllm version 0.9.0 or higher.** ## Switching Between Thinking and Non‑Thinking Modes To enable or disable reasoning mode in HuggingFace, set the `enable_thinking` flag in `tokenizer.apply_chat_template`. For more details, see: - [SGLang Thinking/Non‑Thinking Modes](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) - [vLLM Thinking/Non‑Thinking Modes](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) --- ## Recommended Generation Parameters | Mode | Temperature | presence_penalty | |-----------------------------------|-------------|------------------| | No‑think (general requests) | ≤ 0.3 | 1.0 | | Think mode (standard requests) | ≈ 0.6 | 1.0 | | Complex reasoning requests | ≥ 0.8 | 1.0 | ## 👨‍💻 Examples of usage ### Deployment For deployment, you can use `vllm>=0.9.0` or to create an OpenAI-compatible API endpoint: - vLLM: ```shell vllm serve t-tech/T‑pro‑it‑2.0‑AWQ --enable-reasoning --reasoning-parser qwen3 ``` ## 📖 Citation If you use this model in your research or projects, please cite: ```bibtex @inproceedings{stoianov-etal-2026-pro, title = "{T}-pro 2.0: An Efficient {R}ussian Hybrid-Reasoning Model and Playground", author = "Stoianov, Dmitrii and Taranets, Danil and Tsymboi, Olga and Latypov, Ramil and Dautov, Almaz and Kruglikov, Vladislav and Nikita, Surkov and Abramov, German and Gein, Pavel and Abulkhanov, Dmitry and Gashkov, Mikhail and Zelenkovskiy, Viktor and Batalov, Artem and Medvedev, Aleksandr and Potapov, Anatolii", editor = "Croce, Danilo and Leidner, Jochen and Moosavi, Nafise Sadat", booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 3: System Demonstrations)", month = mar, year = "2026", address = "Rabat, Marocco", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2026.eacl-demo.22/", doi = "10.18653/v1/2026.eacl-demo.22", pages = "297--319", ISBN = "979-8-89176-382-1" } ```