Add files using upload-large-folder tool

Browse files

Files changed (11) hide show

.gitattributes +1 -2
.gitignore +1 -0
README.md +167 -0
config.json +70 -0
logs/events.out.tfevents.1747565125.a8173a591e7c.19.0 +3 -0
model.safetensors +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
training_args.bin +3 -0
vocab.txt +0 -0

.gitattributes CHANGED Viewed

@@ -25,11 +25,10 @@
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ checkpoint-*/

README.md ADDED Viewed

	@@ -0,0 +1,167 @@

+---
+license: mit
+tags:
+  - text-classification
+  - finance
+  - financial-news
+  - bert
+  - topic-classification
+  - transformers
+  - safetensors
+  - pytorch
+  - financial
+  - news
+model-index:
+  - name: finnews-topic-single-classify
+    results:
+      - task:
+          name: Text Classification
+          type: text-classification
+        dataset:
+          name: zeroshot/twitter-financial-news-topic
+          type: finance
+        metrics:
+          - type: accuracy
+            name: accuracy
+            value: 0.907943
+          - type: f1
+            name: F1
+            value: 0.899527
+---
+# Financial News Topic Classifier
+This model is a fine-tuned BERT-based classifier for financial news topic classification based on [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md), supporting 20 distinct financial topics. It is designed for use in financial NLP applications, news analytics, and automated trading systems.
+## Model Description
+- **Architecture:** BERT (for sequence classification)
+- **Framework:** PyTorch, Transformers
+- **Topics:** 20 financial news categories (see below)
+- **License:** MIT
+## Intended Uses & Limitations
+- **Intended Use:**
+  - Classify financial news headlines or short texts into one of 20 financial topics.
+  - Use in financial analytics, news monitoring, and trading agent pipelines.
+- **Limitations:**
+  - Trained on zeroshot/twitter-financial-news-topic; may not generalize to all financial news sources.
+  - Not suitable for non-financial or long-form text.
+## Topics
+| ID | Topic                        |
+|----|------------------------------|
+| 0  | Analyst Update               |
+| 1  | Fed \| Central Banks         |
+| 2  | Company \| Product News      |
+| 3  | Treasuries \| Corporate Debt |
+| 4  | Dividend                     |
+| 5  | Earnings                     |
+| 6  | Energy \| Oil                |
+| 7  | Financials                   |
+| 8  | Currencies                   |
+| 9  | General News \| Opinion      |
+| 10 | Gold \| Metals \| Materials  |
+| 11 | IPO                          |
+| 12 | Legal \| Regulation          |
+| 13 | M&A \| Investments           |
+| 14 | Macro                        |
+| 15 | Markets                      |
+| 16 | Politics                     |
+| 17 | Personnel Change             |
+| 18 | Stock Commentary             |
+| 19 | Stock Movement               |
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
+tokenizer = AutoTokenizer.from_pretrained("leonas5555/finnews-topic-single-classify")
+model = AutoModelForSequenceClassification.from_pretrained("leonas5555/finnews-topic-single-classify")
+nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
+# Example text
+text = "LIVE: ECB surprises with 50bps hike, ending its negative rate era. President Christine Lagarde is taking questions"
+result = nlp(text)
+print(result)
+# Output: [{'label': 'Fed | Central Banks', 'score': 0.98}]
+```
+## Example Inputs & Outputs
+| Example Text                                                                                                         | Predicted Topic                |
+|----------------------------------------------------------------------------------------------------------------------|-------------------------------|
+| "Here are Thursday's biggest analyst calls: Apple, Amazon, Tesla, Palantir, DocuSign, Exxon & more"                  | Analyst Update                |
+| "LIVE: ECB surprises with 50bps hike, ending its negative rate era."                                                 | Fed \| Central Banks          |
+| "Goldman Sachs traders countered the industry's underwriting slump with revenue gains that raced past analysts' estimates." | Company \| Product News       |
+| "China Evergrande Group's onshore bond holders rejected a plan by the distressed developer to further extend a bond payment." | Treasuries \| Corporate Debt  |
+| "Investing Club: Morgan Stanley's dividend, buyback pay us for our patience after quarterly missteps"                 | Dividend                      |
+## Training Data
+- **Dataset:** zeroshot/twitter-financial-news-topic
+- **Size:** 21 107 samples
+- **Class Distribution:** Unbalanced; class weights used during training.
+## Training Procedure
+- **Framework:** HuggingFace Transformers (Trainer API)
+- **Arguments:**
+  - **num_train_epochs:** 10
+  - **per_device_train_batch_size:** 32
+  - **per_device_eval_batch_size:** 32
+  - **gradient_accumulation_steps:** 1
+  - **learning_rate:** 2e-5
+  - **fp16:** True (Native AMP mixed precision)
+  - **warmup_ratio:** 0.1
+  - **label_smoothing_factor:** 0.05
+  - **max_grad_norm:** 1.0
+  - **max_length:** 256
+  - **evaluation_strategy:** "steps"
+  - **save_strategy:** "steps"
+  - **save_total_limit:** 3
+  - **load_best_model_at_end:** True
+  - **metric_for_best_model:** "f1"
+  - **run_name:** "topic_classifier"
+  - **seed:** 42
+- **Early Stopping:** Patience of 2 evaluation steps (via `EarlyStoppingCallback`)
+- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
+- **Scheduler:** Linear
+- **Metrics:** F1 (for best model selection), plus accuracy, precision, recall
+## Evaluation Results
+| Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
+|------|---------------|----------------|----------|-----------|--------|------|
+| 530  | 1.965800      | 0.917674       | 0.805684 | 0.743887  | 0.691372 | 0.696721 |
+| 1060 | 0.733100      | 0.684078       | 0.876366 | 0.815078  | 0.823771 | 0.817982 |
+| 1590 | 0.512200      | 0.638335       | 0.895312 | 0.895471  | 0.893691 | 0.893341 |
+| 2120 | 0.418200      | 0.682780       | 0.894826 | 0.880995  | 0.885067 | 0.880227 |
+| 2650 | 0.380200      | 0.683890       | 0.902113 | 0.890379  | 0.901867 | 0.894882 |
+| 3180 | 0.359500      | 0.696923       | 0.902599 | 0.881292  | 0.902299 | 0.888526 |
+| 3710 | 0.348800      | 0.691665       | 0.906000 | 0.891074  | 0.902236 | 0.895001 |
+| 4240 | 0.342900      | 0.687194       | 0.906728 | 0.896421  | 0.900574 | 0.896865 |
+| 4770 | 0.339900      | 0.705139       | 0.904785 | 0.892559  | 0.903573 | 0.896804 |
+| 5300 | 0.337400      | 0.697512       | 0.907943 | 0.897653  | 0.903964 | 0.899527 |
+## ONNX Export
+An ONNX version of this model is available in the [onnx/](./onnx/) directory for use with high-performance inference engines such as Infinity.
+## License
+MIT
+##  Inspired by:
+- [nickmuchi/finbert-tone-finetuned-finance-topic-classification](https://huggingface.co/nickmuchi/finbert-tone-finetuned-finance-topic-classification/blob/main/README.md)
+---
+**References:**
+- [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md)

config.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "_name_or_path": "leonas5555/finnews-topic-single-classify",
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "Analyst Update",
+    "1": "Fed | Central Banks",
+    "2": "Company | Product News",
+    "3": "Treasuries | Corporate Debt",
+    "4": "Dividend",
+    "5": "Earnings",
+    "6": "Energy | Oil",
+    "7": "Financials",
+    "8": "Currencies",
+    "9": "General News | Opinion",
+    "10": "Gold | Metals | Materials",
+    "11": "IPO",
+    "12": "Legal | Regulation",
+    "13": "M&A | Investments",
+    "14": "Macro",
+    "15": "Markets",
+    "16": "Politics",
+    "17": "Personnel Change",
+    "18": "Stock Commentary",
+    "19": "Stock Movement"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "Analyst Update": 0,
+    "Fed | Central Banks": 1,
+    "Company | Product News": 2,
+    "Treasuries | Corporate Debt": 3,
+    "Dividend": 4,
+    "Earnings": 5,
+    "Energy | Oil": 6,
+    "Financials": 7,
+    "Currencies": 8,
+    "General News | Opinion": 9,
+    "Gold | Metals | Materials": 10,
+    "IPO": 11,
+    "Legal | Regulation": 12,
+    "M&A | Investments": 13,
+    "Macro": 14,
+    "Markets": 15,
+    "Politics": 16,
+    "Personnel Change": 17,
+    "Stock Commentary": 18,
+    "Stock Movement": 19
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30873
+}

logs/events.out.tfevents.1747565125.a8173a591e7c.19.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81d1c20bee5ab2c0d926e253084e36a4d781e0d3aaf6fbf2592c75843805e8c4
+size 12711

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cce36979f172a9dc688001bf796cba2b4dd992219aa74240a0072e8ce44fdd55
+size 439092288

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3ca33398265d2410bde3efc80097ab837d17ee96f818294084e1f81ededae0e
+size 5304

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff