How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("fill-mask", model="Addedk/kbbert-distilled-cased")
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("Addedk/kbbert-distilled-cased")
model = AutoModelForMaskedLM.from_pretrained("Addedk/kbbert-distilled-cased")
Quick Links

KB-BERT distilled base model (cased)

This model is a distilled version of KB-BERT. It was distilled using Swedish data, the 2010-2015 portion of the Swedish Culturomics Gigaword Corpus. The code for the distillation process can be found here. This was done as part of my Master's Thesis: Task-agnostic knowledge distillation of mBERT to Swedish.

Model description

This is a 6-layer version of KB-BERT, having been distilled using the LightMBERT distillation method, but without freezing the embedding layer.

Intended uses & limitations

You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task.

Training data

The data used for distillation was the 2010-2015 portion of the Swedish Culturomics Gigaword Corpus. The tokenized data had a file size of approximately 7.4 GB.

Evaluation results

When evaluated on the SUCX 3.0 dataset, it achieved an average F1 score of 0.887 which is competitive with the score KB-BERT obtained, 0.894.

Additional results and comparisons are presented in my Master's Thesis

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Addedk/kbbert-distilled-cased