Instructions to use KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps") model = AutoModelForMultimodalLM.from_pretrained("KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - sv | |
| # megatron.bert-large.unigram-64k-pretok.25k-steps | |
| This BERT model was trained using the NeMo library. | |
| The size of the model is a regular bert-large. | |
| The model was trained on more than 245GB of data, consisting mostly of web-data and Swedish newspaper text curated by the National Library of Sweden. | |
| Training was done for 25k training steps using a batch size of 8k. | |
| The model has multiple sibling models trained on the same dataset using different tokenizers or more/less parameters: | |
| - [megatron.bert-base.bpe-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.bpe-32k-no_pretok.25k-steps) | |
| - [megatron.bert-base.bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.bpe-64k-no_pretok.25k-steps) | |
| - [megatron.bert-base.spe-bpe-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-32k-no_pretok.25k-steps) | |
| - [megatron.bert-base.spe-bpe-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-32k-pretok.25k-steps) | |
| - [megatron.bert-base.spe-bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-64k-no_pretok.25k-steps) | |
| - [megatron.bert-base.spe-bpe-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps) | |
| - [megatron.bert-base.unigram-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-32k-no_pretok.25k-steps) | |
| - [megatron.bert-base.unigram-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-32k-pretok.25k-steps) | |
| - [megatron.bert-base.unigram-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-64k-no_pretok.25k-steps) | |
| - [megatron.bert-base.unigram-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-64k-pretok.25k-steps) | |
| - [megatron.bert-base.wordpiece-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-32k-no_pretok.25k-steps) | |
| - [megatron.bert-base.wordpiece-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-32k-pretok.25k-steps) | |
| - [megatron.bert-base.wordpiece-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-64k-no_pretok.25k-steps) | |
| - [megatron.bert-base.wordpiece-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-64k-pretok.25k-steps) | |
| - [megatron.bert-large.bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.bpe-64k-no_pretok.25k-steps) | |
| - [megatron.bert-large.spe-bpe-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.spe-bpe-32k-pretok.25k-steps) | |
| - [megatron.bert-large.unigram-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.unigram-32k-pretok.25k-steps) | |
| - [megatron.bert-large.unigram-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps) | |
| - [megatron.bert-large.wordpiece-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.wordpiece-32k-pretok.25k-steps) | |
| - [megatron.bert-large.wordpiece-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.wordpiece-64k-pretok.25k-steps) | |
| ## Acknowledgements | |
| The training was performed on the Luxembourg national supercomputer MeluXina. | |
| The authors gratefully acknowledge the LuxProvide teams for their expert support. | |