Update README.md

dc83d6f verified almost 2 years ago

4.23 kB

	---
	license: apache-2.0
	base_model: distilbert/distilbert-base-uncased
	tags:
	- generated_from_trainer
	datasets:
	- wnut_17
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	model-index:
	- name: wnut-distilbert-finetuned
	results:
	- task:
	name: Token Classification
	type: token-classification
	dataset:
	name: wnut_17
	type: wnut_17
	config: wnut_17
	split: test
	args: wnut_17
	metrics:
	- name: Precision
	type: precision
	value: 0.533625730994152
	- name: Recall
	type: recall
	value: 0.3382761816496756
	- name: F1
	type: f1
	value: 0.414066931366988
	- name: Accuracy
	type: accuracy
	value: 0.9443803172160232
	language:
	- en
	library_name: adapter-transformers
	pipeline_tag: token-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

	# wnut-distilbert-finetuned

	This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the WNUT 2017 dataset for Named Entity Recognition (NER).

	## Model Description

	The `wnut-distilbert-finetuned` model is designed for token classification tasks, specifically for Named Entity Recognition (NER). It leverages the DistilBERT architecture, which is a smaller, faster version of BERT with reduced computational requirements, while maintaining competitive performance.

	## Intended Uses & Limitations

	### Intended Uses
	- Named Entity Recognition (NER): Extract and classify entities such as names, locations, organizations, etc., from text.
	- Text Analysis: Enhance applications in information extraction, question answering, and text understanding.

	### How to Use
	To use this model, you can load it using the Hugging Face Transformers library. Below is an example of how to perform inference using the model:

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	from transformers import pipeline

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("Ashaduzzaman/wnut-distilbert-finetuned")
	model = AutoModelForTokenClassification.from_pretrained("Ashaduzzaman/bert-finetuned-ner")

	# Create a pipeline for NER
	ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

	# Example inference
	text = "Hugging Face Inc. is based in New York City."
	entities = ner_pipeline(text)

	print(entities)
	```
	### Limitations
	- Performance on Other Domains: Performance may vary when applied to domains or data types different from the WNUT 2017 dataset.
	- Entity Types: The model is trained on the specific entity types present in the WNUT 2017 dataset and may not perform well on entity types not covered by the training data.
	- Data Sensitivity: The model may have biases or limitations based on the training data it was exposed to.

	## Training and Evaluation Data

	### Training Data
	- Dataset: WNUT 2017, which includes a set of texts annotated with entities relevant to the dataset.
	- Data Split: Training and validation splits of the WNUT 2017 dataset were used during the fine-tuning process.

	### Evaluation Data
	- Dataset: WNUT 2017 test set, used to evaluate model performance after fine-tuning.

	## Training Procedure

	### Training Hyperparameters
	- Learning Rate: 2e-05
	- Train Batch Size: 16
	- Eval Batch Size: 16
	- Seed: 42
	- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
	- Learning Rate Scheduler: Linear
	- Number of Epochs: 3

	### Training Results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Precision \| Recall \| F1 \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:---------:\|:------:\|:------:\|:--------:\|
	\| No log \| 1.0 \| 213 \| 0.2751 \| 0.5114 \| 0.2289 \| 0.3163 \| 0.9385 \|
	\| No log \| 2.0 \| 426 \| 0.2627 \| 0.5398 \| 0.3327 \| 0.4117 \| 0.9434 \|
	\| 0.1832 \| 3.0 \| 639 \| 0.2704 \| 0.5336 \| 0.3383 \| 0.4141 \| 0.9444 \|

	### Framework Versions
	- Transformers: 4.42.4
	- Pytorch: 2.3.1+cu121
	- Datasets: 2.21.0
	- Tokenizers: 0.19.1