Instructions to use KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps") model = AutoModel.from_pretrained("KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps") - Notebooks
- Google Colab
- Kaggle
megatron.bert-base.spe-bpe-64k-pretok.25k-steps
This BERT model was trained using the NeMo library. The size of the model is a regular bert-large. The model was trained on more than 245GB of data, consisting mostly of web-data and Swedish newspaper text curated by the National Library of Sweden.
Training was done for 25k training steps using a batch size of 8k.
The model has multiple sibling models trained on the same dataset using different tokenizers or more/less parameters:
- megatron.bert-base.bpe-32k-no_pretok.25k-steps
- megatron.bert-base.bpe-64k-no_pretok.25k-steps
- megatron.bert-base.spe-bpe-32k-no_pretok.25k-steps
- megatron.bert-base.spe-bpe-32k-pretok.25k-steps
- megatron.bert-base.spe-bpe-64k-no_pretok.25k-steps
- megatron.bert-base.spe-bpe-64k-pretok.25k-steps
- megatron.bert-base.unigram-32k-no_pretok.25k-steps
- megatron.bert-base.unigram-32k-pretok.25k-steps
- megatron.bert-base.unigram-64k-no_pretok.25k-steps
- megatron.bert-base.unigram-64k-pretok.25k-steps
- megatron.bert-base.wordpiece-32k-no_pretok.25k-steps
- megatron.bert-base.wordpiece-32k-pretok.25k-steps
- megatron.bert-base.wordpiece-64k-no_pretok.25k-steps
- megatron.bert-base.wordpiece-64k-pretok.25k-steps
- megatron.bert-large.bpe-64k-no_pretok.25k-steps
- megatron.bert-large.spe-bpe-32k-pretok.25k-steps
- megatron.bert-large.unigram-32k-pretok.25k-steps
- megatron.bert-large.unigram-64k-pretok.25k-steps
- megatron.bert-large.wordpiece-32k-pretok.25k-steps
- megatron.bert-large.wordpiece-64k-pretok.25k-steps
Acknowledgements
The training was performed on the Luxembourg national supercomputer MeluXina. The authors gratefully acknowledge the LuxProvide teams for their expert support.
- Downloads last month
- 9