--- language: tr license: mit tags: - ner - token-classification - turkish - crf - morphology datasets: - unimelb-nlp/wikiann - turkish-nlp-suite/turkish-wikiNER metrics: - f1 --- # Extended Turkish NER (Hybrid CRF) This is a high-performance **Named Entity Recognition (NER)** model for Turkish, using a hybrid approach of **Conditional Random Fields (CRF)**, deep morphological analysis (**Nuve/Zemberek**), and contextual embeddings (**BERTurk**). ## Features - **6 Extended Categories:** `PER`, `LOC`, `ORG`, `COMPANY`, `GROUP`, `MOVIE`. - **Hybrid Features:** Combines linguistic morphology with semantic BERT vectors. - **Gazetteer Support:** Uses 160K+ entity entries for high precision. ## Performance | Metric | Value | | :--- | :--- | | **Best F1-Score** | **%86.66** | | Precision | %87.42 | | Recall | %85.91 | ## Available Models (6 Variants) | Model File | Description | F1 Score | | :--- | :--- | :--- | | `ner_crf_model.pkl` | **Best Hybrid (Nuve + BERT)** - Main SOTA model | **0.8666** | | `final_proper_model.pkl` | Full features without embeddings | 0.8557 | | `crf_gold_best.pkl` | Best Gold-only trained model | 0.8514 | | `crf_gold_no_emb.pkl` | Gold without BERT embeddings | 0.8496 | | `crf_gold_gaz_only.pkl` | Gazetteer-only features (baseline) | 0.8463 | | `final_model.pkl` | Alternative final configuration | 0.8487 | ## Usage The models are saved as `.pkl` files (sklearn-crfsuite). Refer to the source code for feature extraction logic using Nuve and BERTurk. ### Example Inference ```python import joblib model = joblib.load("models/ner_crf_model.pkl") # Use FeatureExtractor from src/features.py to prepare input ``` ## Citation Please cite this work if you use it in your research. [Akademik Makale](https://github.com/WildGenie/nerextended/blob/master/docs/Akademik_Makale.md)