--- license: mit tags: - text-classification - finance - financial-news - bert - topic-classification - transformers - safetensors - pytorch - financial - news model-index: - name: finnews-topic-single-classify results: - task: name: Text Classification type: text-classification dataset: name: zeroshot/twitter-financial-news-topic type: finance metrics: - type: accuracy name: accuracy value: 0.907943 - type: f1 name: F1 value: 0.899527 --- # Financial News Topic Classifier This model is a fine-tuned BERT-based classifier for financial news topic classification based on [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md), supporting 20 distinct financial topics. It is designed for use in financial NLP applications, news analytics, and automated trading systems. ## Model Description - **Architecture:** BERT (for sequence classification) - **Framework:** PyTorch, Transformers - **Topics:** 20 financial news categories (see below) - **License:** MIT ## Intended Uses & Limitations - **Intended Use:** - Classify financial news headlines or short texts into one of 20 financial topics. - Use in financial analytics, news monitoring, and trading agent pipelines. - **Limitations:** - Trained on zeroshot/twitter-financial-news-topic; may not generalize to all financial news sources. - Not suitable for non-financial or long-form text. ## Topics | ID | Topic | |----|------------------------------| | 0 | Analyst Update | | 1 | Fed \| Central Banks | | 2 | Company \| Product News | | 3 | Treasuries \| Corporate Debt | | 4 | Dividend | | 5 | Earnings | | 6 | Energy \| Oil | | 7 | Financials | | 8 | Currencies | | 9 | General News \| Opinion | | 10 | Gold \| Metals \| Materials | | 11 | IPO | | 12 | Legal \| Regulation | | 13 | M&A \| Investments | | 14 | Macro | | 15 | Markets | | 16 | Politics | | 17 | Personnel Change | | 18 | Stock Commentary | | 19 | Stock Movement | ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("leonas5555/finnews-topic-single-classify") model = AutoModelForSequenceClassification.from_pretrained("leonas5555/finnews-topic-single-classify") nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) # Example text text = "LIVE: ECB surprises with 50bps hike, ending its negative rate era. President Christine Lagarde is taking questions" result = nlp(text) print(result) # Output: [{'label': 'Fed | Central Banks', 'score': 0.98}] ``` ## Example Inputs & Outputs | Example Text | Predicted Topic | |----------------------------------------------------------------------------------------------------------------------|-------------------------------| | "Here are Thursday's biggest analyst calls: Apple, Amazon, Tesla, Palantir, DocuSign, Exxon & more" | Analyst Update | | "LIVE: ECB surprises with 50bps hike, ending its negative rate era." | Fed \| Central Banks | | "Goldman Sachs traders countered the industry's underwriting slump with revenue gains that raced past analysts' estimates." | Company \| Product News | | "China Evergrande Group's onshore bond holders rejected a plan by the distressed developer to further extend a bond payment." | Treasuries \| Corporate Debt | | "Investing Club: Morgan Stanley's dividend, buyback pay us for our patience after quarterly missteps" | Dividend | ## Training Data - **Dataset:** zeroshot/twitter-financial-news-topic - **Size:** 21 107 samples - **Class Distribution:** Unbalanced; class weights used during training. ## Training Procedure - **Framework:** HuggingFace Transformers (Trainer API) - **Arguments:** - **num_train_epochs:** 10 - **per_device_train_batch_size:** 32 - **per_device_eval_batch_size:** 32 - **gradient_accumulation_steps:** 1 - **learning_rate:** 2e-5 - **fp16:** True (Native AMP mixed precision) - **warmup_ratio:** 0.1 - **label_smoothing_factor:** 0.05 - **max_grad_norm:** 1.0 - **max_length:** 256 - **evaluation_strategy:** "steps" - **save_strategy:** "steps" - **save_total_limit:** 3 - **load_best_model_at_end:** True - **metric_for_best_model:** "f1" - **run_name:** "topic_classifier" - **seed:** 42 - **Early Stopping:** Patience of 2 evaluation steps (via `EarlyStoppingCallback`) - **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08) - **Scheduler:** Linear - **Metrics:** F1 (for best model selection), plus accuracy, precision, recall ## Evaluation Results | Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 | |------|---------------|----------------|----------|-----------|--------|------| | 530 | 1.965800 | 0.917674 | 0.805684 | 0.743887 | 0.691372 | 0.696721 | | 1060 | 0.733100 | 0.684078 | 0.876366 | 0.815078 | 0.823771 | 0.817982 | | 1590 | 0.512200 | 0.638335 | 0.895312 | 0.895471 | 0.893691 | 0.893341 | | 2120 | 0.418200 | 0.682780 | 0.894826 | 0.880995 | 0.885067 | 0.880227 | | 2650 | 0.380200 | 0.683890 | 0.902113 | 0.890379 | 0.901867 | 0.894882 | | 3180 | 0.359500 | 0.696923 | 0.902599 | 0.881292 | 0.902299 | 0.888526 | | 3710 | 0.348800 | 0.691665 | 0.906000 | 0.891074 | 0.902236 | 0.895001 | | 4240 | 0.342900 | 0.687194 | 0.906728 | 0.896421 | 0.900574 | 0.896865 | | 4770 | 0.339900 | 0.705139 | 0.904785 | 0.892559 | 0.903573 | 0.896804 | | 5300 | 0.337400 | 0.697512 | 0.907943 | 0.897653 | 0.903964 | 0.899527 | ## ONNX Export An ONNX version of this model {TBD} for use with high-performance inference engines such as Infinity. optimum-cli export onnx -m leonas5555/finnews-topic-single-classify ## License MIT ## Inspired by: - [nickmuchi/finbert-tone-finetuned-finance-topic-classification](https://huggingface.co/nickmuchi/finbert-tone-finetuned-finance-topic-classification/blob/main/README.md) --- **References:** - [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md)