Model description

This model is a fine-tuned version of coastalcph/danish-legal-longformer-base on the Danish part of MultiEURLEX dataset.

Training and evaluation data

The Danish part of MultiEURLEX dataset.

Use of Model

As a text classifier:

from transformers import pipeline
import numpy as np

# Init text classification pipeline
text_cls_pipe = pipeline(task="text-classification",
                         model="coastalcph/danish-legal-longformer-eurlex",
                         use_auth_token='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')

# Encode and Classify document
predictions = text_cls_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                            "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                            "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Print prediction
print(predictions)
# [{'label': 'building and public works', 'score': 0.9626012444496155}]

As a feature extractor (document embedder):

from transformers import pipeline
import numpy as np

# Init feature extraction pipeline
feature_extraction_pipe = pipeline(task="feature-extraction",
                                   model="coastalcph/danish-legal-longformer-eurlex",
                                   use_auth_token='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')

# Encode document
predictions = feature_extraction_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                                      "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                                      "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Use CLS token representation as document embedding
document_features = token_wise_features[0][0]

print(document_features.shape)
# (768,)

Framework versions

Transformers 4.18.0
Pytorch 1.12.0+cu113
Datasets 2.0.0
Tokenizers 0.12.1

Downloads last month: 9

Dataset used to train coastalcph/danish-legal-longformer-eurlex

Evaluation results

Micro-F1 on multi_eurlex
validation set self-reported

0.757
Macro-F1 on multi_eurlex
validation set self-reported

0.529