Token Classification
SpanMarker
TensorBoard
Safetensors
English
ner
named-entity-recognition
generated_from_span_marker_trainer
Eval Results (legacy)
Instructions to use davanstrien/span-marker-bert-base-fewnerd-coarse-super with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SpanMarker
How to use davanstrien/span-marker-bert-base-fewnerd-coarse-super with SpanMarker:
from span_marker import SpanMarkerModel model = SpanMarkerModel.from_pretrained("davanstrien/span-marker-bert-base-fewnerd-coarse-super") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: mit | |
| library_name: span-marker | |
| tags: | |
| - span-marker | |
| - token-classification | |
| - ner | |
| - named-entity-recognition | |
| - generated_from_span_marker_trainer | |
| datasets: | |
| - DFKI-SLT/few-nerd | |
| metrics: | |
| - precision | |
| - recall | |
| - f1 | |
| widget: | |
| - text: The Hebrew Union College libraries in Cincinnati and Los Angeles, the Library | |
| of Congress in Washington, D.C ., the Jewish Theological Seminary in New York | |
| City, and the Harvard University Library (which received donations of Deinard's | |
| texts from Lucius Nathan Littauer, housed in Widener and Houghton libraries) also | |
| have large collections of Deinard works. | |
| - text: Abu Abd Allah Muhammad al-Idrisi (1099–1165 or 1166), the Moroccan Muslim | |
| geographer, cartographer, Egyptologist and traveller who lived in Sicily at the | |
| court of King Roger II, mentioned this island, naming it جزيرة مليطمة ("jazīrat | |
| Malīṭma", "the island of Malitma ") on page 583 of his book "Nuzhat al-mushtaq | |
| fi ihtiraq ghal afaq", otherwise known as The Book of Roger, considered a geographic | |
| encyclopaedia of the medieval world. | |
| - text: The font is also used in the logo of the American rock band Greta Van Fleet, | |
| in the logo for Netflix show "Stranger Things ", and in the album art for rapper | |
| Logic's album "Supermarket ". | |
| - text: Caretaker manager George Goss led them on a run in the FA Cup, defeating Liverpool | |
| in round 4, to reach the semi-final at Stamford Bridge, where they were defeated | |
| 2–0 by Sheffield United on 28 March 1925. | |
| - text: In 1991, the National Science Foundation (NSF), which manages the U.S . Antarctic | |
| Program (US AP), honoured his memory by dedicating a state-of-the-art laboratory | |
| complex in his name, the Albert P. Crary Science and Engineering Center (CSEC) | |
| located in McMurdo Station. | |
| pipeline_tag: token-classification | |
| base_model: numind/generic-entity_recognition_NER-v1 | |
| model-index: | |
| - name: SpanMarker with numind/generic-entity_recognition_NER-v1 on DFKI-SLT/few-nerd | |
| results: | |
| - task: | |
| type: token-classification | |
| name: Named Entity Recognition | |
| dataset: | |
| name: Unknown | |
| type: DFKI-SLT/few-nerd | |
| split: test | |
| metrics: | |
| - type: f1 | |
| value: 0.7665505226480835 | |
| name: F1 | |
| - type: precision | |
| value: 0.7581967213114754 | |
| name: Precision | |
| - type: recall | |
| value: 0.775090458960198 | |
| name: Recall | |
| # SpanMarker with numind/generic-entity_recognition_NER-v1 on DFKI-SLT/few-nerd | |
| This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [DFKI-SLT/few-nerd](https://huggingface.co/datasets/DFKI-SLT/few-nerd) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [numind/generic-entity_recognition_NER-v1](https://huggingface.co/numind/generic-entity_recognition_NER-v1) as the underlying encoder. | |
| ## Model Details | |
| ### Model Description | |
| - **Model Type:** SpanMarker | |
| - **Encoder:** [numind/generic-entity_recognition_NER-v1](https://huggingface.co/numind/generic-entity_recognition_NER-v1) | |
| - **Maximum Sequence Length:** 256 tokens | |
| - **Maximum Entity Length:** 19 words | |
| - **Training Dataset:** [DFKI-SLT/few-nerd](https://huggingface.co/datasets/DFKI-SLT/few-nerd) | |
| - **Language:** en | |
| - **License:** mit | |
| ### Model Sources | |
| - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) | |
| - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) | |
| ### Model Labels | |
| | Label | Examples | | |
| |:-------------|:-------------------------------------------------------------------------------| | |
| | art | "Time", "The Seven Year Itch", "Imelda de ' Lambertazzi" | | |
| | building | "Boston Garden", "Sheremetyevo International Airport", "Henry Ford Museum" | | |
| | event | "Iranian Constitutional Revolution", "Russian Revolution", "French Revolution" | | |
| | location | "the Republic of Croatia", "Croatian", "Mediterranean Basin" | | |
| | organization | "IAEA", "Texas Chicken", "Church 's Chicken" | | |
| | other | "BAR", "Amphiphysin", "N-terminal lipid" | | |
| | person | "Edmund Payne", "Hicks", "Ellaline Terriss" | | |
| | product | "Phantom", "100EX", "Corvettes - GT1 C6R" | | |
| ## Evaluation | |
| ### Metrics | |
| | Label | Precision | Recall | F1 | | |
| |:-------------|:----------|:-------|:-------| | |
| | **all** | 0.7582 | 0.7751 | 0.7666 | | |
| | art | 0.7713 | 0.7783 | 0.7748 | | |
| | building | 0.6034 | 0.7085 | 0.6518 | | |
| | event | 0.5512 | 0.5207 | 0.5355 | | |
| | location | 0.8163 | 0.8321 | 0.8242 | | |
| | organization | 0.7083 | 0.6894 | 0.6987 | | |
| | other | 0.6748 | 0.7253 | 0.6991 | | |
| | person | 0.8987 | 0.9053 | 0.9020 | | |
| | product | 0.5685 | 0.6431 | 0.6035 | | |
| ## Uses | |
| ### Direct Use for Inference | |
| ```python | |
| from span_marker import SpanMarkerModel | |
| # Download from the 🤗 Hub | |
| model = SpanMarkerModel.from_pretrained("span_marker_model_id") | |
| # Run inference | |
| entities = model.predict("Caretaker manager George Goss led them on a run in the FA Cup, defeating Liverpool in round 4, to reach the semi-final at Stamford Bridge, where they were defeated 2–0 by Sheffield United on 28 March 1925.") | |
| ``` | |
| ### Downstream Use | |
| You can finetune this model on your own dataset. | |
| <details><summary>Click to expand</summary> | |
| ```python | |
| from span_marker import SpanMarkerModel, Trainer | |
| # Download from the 🤗 Hub | |
| model = SpanMarkerModel.from_pretrained("span_marker_model_id") | |
| # Specify a Dataset with "tokens" and "ner_tag" columns | |
| dataset = load_dataset("conll2003") # For example CoNLL2003 | |
| # Initialize a Trainer using the pretrained model & dataset | |
| trainer = Trainer( | |
| model=model, | |
| train_dataset=dataset["train"], | |
| eval_dataset=dataset["validation"], | |
| ) | |
| trainer.train() | |
| trainer.save_model("span_marker_model_id-finetuned") | |
| ``` | |
| </details> | |
| <!-- | |
| ### Out-of-Scope Use | |
| *List how the model may foreseeably be misused and address what users ought not to do with the model.* | |
| --> | |
| <!-- | |
| ## Bias, Risks and Limitations | |
| *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* | |
| --> | |
| <!-- | |
| ### Recommendations | |
| *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* | |
| --> | |
| ## Training Details | |
| ### Training Set Metrics | |
| | Training set | Min | Median | Max | | |
| |:----------------------|:----|:--------|:----| | |
| | Sentence length | 1 | 24.4956 | 163 | | |
| | Entities per sentence | 0 | 2.5439 | 35 | | |
| ### Training Hyperparameters | |
| - learning_rate: 5e-05 | |
| - train_batch_size: 64 | |
| - eval_batch_size: 128 | |
| - seed: 42 | |
| - gradient_accumulation_steps: 2 | |
| - total_train_batch_size: 128 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: linear | |
| - lr_scheduler_warmup_ratio: 0.1 | |
| - num_epochs: 10 | |
| - mixed_precision_training: Native AMP | |
| ### Training Results | |
| | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy | | |
| |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:| | |
| | 1.7467 | 200 | 0.0120 | 0.7533 | 0.7473 | 0.7503 | 0.9286 | | |
| | 3.4934 | 400 | 0.0110 | 0.7659 | 0.7761 | 0.7710 | 0.9385 | | |
| | 5.2402 | 600 | 0.0114 | 0.7772 | 0.7899 | 0.7835 | 0.9424 | | |
| | 6.9869 | 800 | 0.0120 | 0.7724 | 0.7953 | 0.7837 | 0.9421 | | |
| | 8.7336 | 1000 | 0.0124 | 0.7680 | 0.7942 | 0.7809 | 0.9413 | | |
| ### Framework Versions | |
| - Python: 3.10.12 | |
| - SpanMarker: 1.5.0 | |
| - Transformers: 4.35.2 | |
| - PyTorch: 2.1.0+cu118 | |
| - Datasets: 2.14.7 | |
| - Tokenizers: 0.15.0 | |
| ## Citation | |
| ### BibTeX | |
| ``` | |
| @software{Aarsen_SpanMarker, | |
| author = {Aarsen, Tom}, | |
| license = {Apache-2.0}, | |
| title = {{SpanMarker for Named Entity Recognition}}, | |
| url = {https://github.com/tomaarsen/SpanMarkerNER} | |
| } | |
| ``` | |
| <!-- | |
| ## Glossary | |
| *Clearly define terms in order to be accessible across audiences.* | |
| --> | |
| <!-- | |
| ## Model Card Authors | |
| *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* | |
| --> | |
| <!-- | |
| ## Model Card Contact | |
| *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* | |
| --> |