ashaduzzaman's picture
Update README.md
dc83d6f verified
|
Raw
History Blame Contribute Delete
4.23 kB
---
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- generated_from_trainer
datasets:
- wnut_17
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: wnut-distilbert-finetuned
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: wnut_17
type: wnut_17
config: wnut_17
split: test
args: wnut_17
metrics:
- name: Precision
type: precision
value: 0.533625730994152
- name: Recall
type: recall
value: 0.3382761816496756
- name: F1
type: f1
value: 0.414066931366988
- name: Accuracy
type: accuracy
value: 0.9443803172160232
language:
- en
library_name: adapter-transformers
pipeline_tag: token-classification
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
# wnut-distilbert-finetuned
This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the WNUT 2017 dataset for Named Entity Recognition (NER).
## Model Description
The `wnut-distilbert-finetuned` model is designed for token classification tasks, specifically for Named Entity Recognition (NER). It leverages the DistilBERT architecture, which is a smaller, faster version of BERT with reduced computational requirements, while maintaining competitive performance.
## Intended Uses & Limitations
### Intended Uses
- **Named Entity Recognition (NER)**: Extract and classify entities such as names, locations, organizations, etc., from text.
- **Text Analysis**: Enhance applications in information extraction, question answering, and text understanding.
### How to Use
To use this model, you can load it using the Hugging Face Transformers library. Below is an example of how to perform inference using the model:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Ashaduzzaman/wnut-distilbert-finetuned")
model = AutoModelForTokenClassification.from_pretrained("Ashaduzzaman/bert-finetuned-ner")
# Create a pipeline for NER
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
# Example inference
text = "Hugging Face Inc. is based in New York City."
entities = ner_pipeline(text)
print(entities)
```
### Limitations
- **Performance on Other Domains**: Performance may vary when applied to domains or data types different from the WNUT 2017 dataset.
- **Entity Types**: The model is trained on the specific entity types present in the WNUT 2017 dataset and may not perform well on entity types not covered by the training data.
- **Data Sensitivity**: The model may have biases or limitations based on the training data it was exposed to.
## Training and Evaluation Data
### Training Data
- **Dataset**: WNUT 2017, which includes a set of texts annotated with entities relevant to the dataset.
- **Data Split**: Training and validation splits of the WNUT 2017 dataset were used during the fine-tuning process.
### Evaluation Data
- **Dataset**: WNUT 2017 test set, used to evaluate model performance after fine-tuning.
## Training Procedure
### Training Hyperparameters
- **Learning Rate**: 2e-05
- **Train Batch Size**: 16
- **Eval Batch Size**: 16
- **Seed**: 42
- **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- **Learning Rate Scheduler**: Linear
- **Number of Epochs**: 3
### Training Results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log | 1.0 | 213 | 0.2751 | 0.5114 | 0.2289 | 0.3163 | 0.9385 |
| No log | 2.0 | 426 | 0.2627 | 0.5398 | 0.3327 | 0.4117 | 0.9434 |
| 0.1832 | 3.0 | 639 | 0.2704 | 0.5336 | 0.3383 | 0.4141 | 0.9444 |
### Framework Versions
- **Transformers**: 4.42.4
- **Pytorch**: 2.3.1+cu121
- **Datasets**: 2.21.0
- **Tokenizers**: 0.19.1