ILSVRC/imagenet-1k
Viewer • Updated • 1.43M • 91.7k • 838
How to use NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k with timm:
import timm
model = timm.create_model("hf_hub:NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k", pretrained=True)This repository contains the ImageNet-1K EMA weights for ThinkingViT DeiT 3H -> 6H 800 Epochs ImageNet-1K from ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference.
state_dict_emamodel.safetensorsimport torch
from timm.models import create_model
# Run from the ThinkingViT repository root, or put this repository on PYTHONPATH.
model = create_model("hf-hub:NCPS/thinkingvit_deit-3h-6h-800epochs-imagenet1k", pretrained=True)
model.eval()
x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
logits, stage = model(x, threshold=1.0)
print(logits.shape, stage)
This is a custom timm-based architecture. Use the code from the ThinkingViT repository when loading this model.
The entropy threshold controls early exit. Lower thresholds send more samples to the 6-head stage; higher thresholds exit earlier at the 3-head stage.
| Threshold | Acc@1 (%) | GMACs |
|---|---|---|
| 0.0 | 81.850 | 5.850 |
| 0.1 | 81.848 | 5.385 |
| 0.2 | 81.846 | 4.751 |
| 0.3 | 81.832 | 4.363 |
| 0.5 | 81.758 | 3.841 |
| 0.8 | 81.386 | 3.189 |
| 1.0 | 80.636 | 2.781 |
| 1.2 | 79.764 | 2.433 |
| 1.4 | 78.846 | 2.136 |
| 1.6 | 77.688 | 1.865 |
| 2.0 | 75.500 | 1.417 |
| 5.0 | 74.514 | 1.250 |
| 10.0 | 74.514 | 1.250 |
Please cite the ThinkingViT paper if you use this model: https://arxiv.org/abs/2507.10800