language:
- en
license: mit
base_model: microsoft/deberta-v3-xsmall
pipeline_tag: token-classification
tags:
- dependency-parsing
- pos-tagging
- universal-dependencies
datasets:
- universal_dependencies
metrics:
- accuracy
- las
- uas
model-index:
- name: deberta-v3-xsmall-biaffine-dep-pos-en-ewt-gum
results:
- task:
type: token-classification
name: Dependency Parsing & POS/Morphology Tagging
dataset:
type: universal_dependencies
name: EWT + GUM
split: test
metrics:
- type: las
name: LAS
value: 92.97
- type: uas
name: UAS
value: 94.74
- type: accuracy
name: UPOS
value: 98.04
- type: ucm
name: UCM
value: 70.23
- type: lcm
name: LCM
value: 60.91
ModernBiaffineParser — microsoft/deberta-v3-xsmall
Biaffine dependency parser + joint UPOS tagger trained on Universal Dependencies English Web Treebank (EWT) and GUM.
Encoder: microsoft/deberta-v3-xsmall (frozen weights not included — loaded from HuggingFace at runtime)
Custom head: biaffine_head.safetensors — word projection, arc/rel/POS MLPs and biaffine layers
Labels: 53 DEPREL labels · 19 UPOS tags
Score convention: s_arc[dep, head], s_rel[dep, head, rel]
Metrics (EWT + GUM, decode: Eisner (projective MST))
| Split | LAS | UPOS | UCM | LCM |
|---|---|---|---|---|
| dev | 93.03% | 98.09% | 70.75% | 60.96% |
| test | 92.97% | 98.04% | 70.23% | 60.91% |
ONNX / production use
model.onnx — fp32 model (Recommended for CPU inference).
model.fp16.onnx — fp16 model (For GPU or environments with native fp16 support, ~~139 MB).
Inputs: subwords [B, W, 20] int64. Outputs: s_arc [B,W,W], s_rel [B,W,W,R], s_pos [B,W,P].
# download only inference artifacts
hf download ghotriw/deberta-v3-xsmall-biaffine-dep-pos-en-ewt-gum \
model.fp16.onnx model.onnx vocabs.json tokenizer.json \
--local-dir ./model
vocabs.json — DEPREL and UPOS vocabularies (str→int dicts).
Input format
The model expects a word-level subword grid [B, W, fix_len=20] (int64),
where each word is independently tokenised with the encoder's sentencepiece tokeniser
and padded/truncated to 20 subword slots. Position 0 is a synthetic ROOT word
whose only subword is [CLS] (id 1).
Vocabularies
config.json contains rel_vocab (str→int) and pos_vocab (str→int).
Index 0 is the <pad> / ROOT slot and should be ignored in evaluation.