ThinkingViT DeiT 3H -> 6H 800 Epochs ImageNet-1K

This repository contains the ImageNet-1K EMA weights for ThinkingViT DeiT 3H -> 6H 800 Epochs ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.

Usage

import torch
from timm.models import create_model

# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k", pretrained=True)
model.eval()

x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
    logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)

This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.

Threshold Behavior

The entropy threshold controls early exit. Lower thresholds send more samples to the 6-head stage; higher thresholds exit earlier at the 3-head stage.

ImageNet-1K Results

Threshold Acc@1 (%) GMACs
0.0 81.850 5.850
0.1 81.848 5.385
0.2 81.846 4.751
0.3 81.832 4.363
0.5 81.758 3.841
0.8 81.386 3.189
1.0 80.636 2.781
1.2 79.764 2.433
1.4 78.846 2.136
1.6 77.688 1.865
2.0 75.500 1.417
5.0 74.514 1.250
10.0 74.514 1.250

Citation

Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800

Downloads last month
10
Safetensors
Model size
22.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k

Paper for NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k