Instructions to use banglagov/banBERT-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use banglagov/banBERT-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="banglagov/banBERT-Base")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("banglagov/banBERT-Base") model = AutoModel.from_pretrained("banglagov/banBERT-Base") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,31 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- bn
|
| 4 |
+
tags:
|
| 5 |
+
- bert
|
| 6 |
+
- bangla
|
| 7 |
+
- mlm
|
| 8 |
+
- nsp
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# BERT base model for Bangla
|
| 12 |
+
|
| 13 |
+
Pretrained [BERT](https://arxiv.org/abs/1810.04805) model for Bangla. The model was trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
|
| 14 |
+
|
| 15 |
+
## Model Details
|
| 16 |
+
|
| 17 |
+
This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters. The model was trained on a corpus of 39 GB Bangla text data with a vocabulary size of 50k tokens. The model was trained for 1 million steps with a batch size of 440 and a learning rate of 5e-5. The model was trained on two NVIDIA GeForce A40 GPUs.
|
| 18 |
+
|
| 19 |
+
## How to use
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
from transformers import AutoModel, AutoTokenizer
|
| 23 |
+
|
| 24 |
+
model = AutoModel.from_pretrained("eblict/BERT-Base")
|
| 25 |
+
tokenizer = AutoTokenizer.from_pretrained("eblict/BERT-Base")
|
| 26 |
+
|
| 27 |
+
text = "আমি বাংলায় পড়ি।"
|
| 28 |
+
|
| 29 |
+
tokenized_text = tokenizer(text, return_tensors="pt")
|
| 30 |
+
outputs = model(**tokenized_text)
|
| 31 |
+
```
|