ccdv/arxiv-classification
Viewer • Updated • 66.8k • 4.4k • 24
jhu-clsp/ettin-encoder-32m (ModernBERT encoder, 8192 context) finetuned for arXiv paper topic classification (11 classes), 512-token context (ablation baseline) on ccdv/arxiv-classification.
| metric | value |
|---|---|
| accuracy | 0.774 |
| macro-F1 | 0.7692 |
| eval max_length | 512 |
Finetuned on a single RTX 3080 (bf16). See the project for the full training pipeline.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("vumichien/ettin-encoder-32m-arxiv-classification-512")
model = AutoModelForSequenceClassification.from_pretrained("vumichien/ettin-encoder-32m-arxiv-classification-512")
inputs = tok("your text here", truncation=True, max_length=512, return_tensors="pt")
pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])
Base model
jhu-clsp/ettin-encoder-32m