--- license: apache-2.0 base_model: distilbert/distilbert-base-uncased tags: - generated_from_trainer datasets: - wnut_17 metrics: - precision - recall - f1 - accuracy model-index: - name: wnut-distilbert-finetuned results: - task: name: Token Classification type: token-classification dataset: name: wnut_17 type: wnut_17 config: wnut_17 split: test args: wnut_17 metrics: - name: Precision type: precision value: 0.533625730994152 - name: Recall type: recall value: 0.3382761816496756 - name: F1 type: f1 value: 0.414066931366988 - name: Accuracy type: accuracy value: 0.9443803172160232 language: - en library_name: adapter-transformers pipeline_tag: token-classification --- # wnut-distilbert-finetuned This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the WNUT 2017 dataset for Named Entity Recognition (NER). ## Model Description The `wnut-distilbert-finetuned` model is designed for token classification tasks, specifically for Named Entity Recognition (NER). It leverages the DistilBERT architecture, which is a smaller, faster version of BERT with reduced computational requirements, while maintaining competitive performance. ## Intended Uses & Limitations ### Intended Uses - **Named Entity Recognition (NER)**: Extract and classify entities such as names, locations, organizations, etc., from text. - **Text Analysis**: Enhance applications in information extraction, question answering, and text understanding. ### How to Use To use this model, you can load it using the Hugging Face Transformers library. Below is an example of how to perform inference using the model: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("Ashaduzzaman/wnut-distilbert-finetuned") model = AutoModelForTokenClassification.from_pretrained("Ashaduzzaman/bert-finetuned-ner") # Create a pipeline for NER ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer) # Example inference text = "Hugging Face Inc. is based in New York City." entities = ner_pipeline(text) print(entities) ``` ### Limitations - **Performance on Other Domains**: Performance may vary when applied to domains or data types different from the WNUT 2017 dataset. - **Entity Types**: The model is trained on the specific entity types present in the WNUT 2017 dataset and may not perform well on entity types not covered by the training data. - **Data Sensitivity**: The model may have biases or limitations based on the training data it was exposed to. ## Training and Evaluation Data ### Training Data - **Dataset**: WNUT 2017, which includes a set of texts annotated with entities relevant to the dataset. - **Data Split**: Training and validation splits of the WNUT 2017 dataset were used during the fine-tuning process. ### Evaluation Data - **Dataset**: WNUT 2017 test set, used to evaluate model performance after fine-tuning. ## Training Procedure ### Training Hyperparameters - **Learning Rate**: 2e-05 - **Train Batch Size**: 16 - **Eval Batch Size**: 16 - **Seed**: 42 - **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08 - **Learning Rate Scheduler**: Linear - **Number of Epochs**: 3 ### Training Results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | No log | 1.0 | 213 | 0.2751 | 0.5114 | 0.2289 | 0.3163 | 0.9385 | | No log | 2.0 | 426 | 0.2627 | 0.5398 | 0.3327 | 0.4117 | 0.9434 | | 0.1832 | 3.0 | 639 | 0.2704 | 0.5336 | 0.3383 | 0.4141 | 0.9444 | ### Framework Versions - **Transformers**: 4.42.4 - **Pytorch**: 2.3.1+cu121 - **Datasets**: 2.21.0 - **Tokenizers**: 0.19.1