banglagov commited on
Commit
093423f
·
verified ·
1 Parent(s): 1292f64

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -1,3 +1,31 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bn
4
+ tags:
5
+ - bert
6
+ - bangla
7
+ - mlm
8
+ - nsp
9
+ ---
10
+
11
+ # BERT base model for Bangla
12
+
13
+ Pretrained [BERT](https://arxiv.org/abs/1810.04805) model for Bangla. The model was trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
14
+
15
+ ## Model Details
16
+
17
+ This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters. The model was trained on a corpus of 39 GB Bangla text data with a vocabulary size of 50k tokens. The model was trained for 1 million steps with a batch size of 440 and a learning rate of 5e-5. The model was trained on two NVIDIA GeForce A40 GPUs.
18
+
19
+ ## How to use
20
+
21
+ ```python
22
+ from transformers import AutoModel, AutoTokenizer
23
+
24
+ model = AutoModel.from_pretrained("eblict/BERT-Base")
25
+ tokenizer = AutoTokenizer.from_pretrained("eblict/BERT-Base")
26
+
27
+ text = "আমি বাংলায় পড়ি।"
28
+
29
+ tokenized_text = tokenizer(text, return_tensors="pt")
30
+ outputs = model(**tokenized_text)
31
+ ```