leonas5555 commited on
Commit
6e10617
·
verified ·
1 Parent(s): 7a2be60

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -25,11 +25,10 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
32
  *.xz filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
31
  *.xz filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ checkpoint-*/
README.md ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - text-classification
5
+ - finance
6
+ - financial-news
7
+ - bert
8
+ - topic-classification
9
+ - transformers
10
+ - safetensors
11
+ - pytorch
12
+ - financial
13
+ - news
14
+ model-index:
15
+ - name: finnews-topic-single-classify
16
+ results:
17
+ - task:
18
+ name: Text Classification
19
+ type: text-classification
20
+ dataset:
21
+ name: zeroshot/twitter-financial-news-topic
22
+ type: finance
23
+ metrics:
24
+ - type: accuracy
25
+ name: accuracy
26
+ value: 0.907943
27
+ - type: f1
28
+ name: F1
29
+ value: 0.899527
30
+ ---
31
+
32
+ # Financial News Topic Classifier
33
+
34
+ This model is a fine-tuned BERT-based classifier for financial news topic classification based on [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md), supporting 20 distinct financial topics. It is designed for use in financial NLP applications, news analytics, and automated trading systems.
35
+
36
+ ## Model Description
37
+
38
+ - **Architecture:** BERT (for sequence classification)
39
+ - **Framework:** PyTorch, Transformers
40
+ - **Topics:** 20 financial news categories (see below)
41
+ - **License:** MIT
42
+
43
+ ## Intended Uses & Limitations
44
+
45
+ - **Intended Use:**
46
+ - Classify financial news headlines or short texts into one of 20 financial topics.
47
+ - Use in financial analytics, news monitoring, and trading agent pipelines.
48
+ - **Limitations:**
49
+ - Trained on zeroshot/twitter-financial-news-topic; may not generalize to all financial news sources.
50
+ - Not suitable for non-financial or long-form text.
51
+
52
+ ## Topics
53
+
54
+ | ID | Topic |
55
+ |----|------------------------------|
56
+ | 0 | Analyst Update |
57
+ | 1 | Fed \| Central Banks |
58
+ | 2 | Company \| Product News |
59
+ | 3 | Treasuries \| Corporate Debt |
60
+ | 4 | Dividend |
61
+ | 5 | Earnings |
62
+ | 6 | Energy \| Oil |
63
+ | 7 | Financials |
64
+ | 8 | Currencies |
65
+ | 9 | General News \| Opinion |
66
+ | 10 | Gold \| Metals \| Materials |
67
+ | 11 | IPO |
68
+ | 12 | Legal \| Regulation |
69
+ | 13 | M&A \| Investments |
70
+ | 14 | Macro |
71
+ | 15 | Markets |
72
+ | 16 | Politics |
73
+ | 17 | Personnel Change |
74
+ | 18 | Stock Commentary |
75
+ | 19 | Stock Movement |
76
+
77
+ ## Example Usage
78
+
79
+ ```python
80
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained("leonas5555/finnews-topic-single-classify")
83
+ model = AutoModelForSequenceClassification.from_pretrained("leonas5555/finnews-topic-single-classify")
84
+
85
+ nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
86
+
87
+ # Example text
88
+ text = "LIVE: ECB surprises with 50bps hike, ending its negative rate era. President Christine Lagarde is taking questions"
89
+
90
+ result = nlp(text)
91
+ print(result)
92
+ # Output: [{'label': 'Fed | Central Banks', 'score': 0.98}]
93
+ ```
94
+
95
+ ## Example Inputs & Outputs
96
+
97
+ | Example Text | Predicted Topic |
98
+ |----------------------------------------------------------------------------------------------------------------------|-------------------------------|
99
+ | "Here are Thursday's biggest analyst calls: Apple, Amazon, Tesla, Palantir, DocuSign, Exxon & more" | Analyst Update |
100
+ | "LIVE: ECB surprises with 50bps hike, ending its negative rate era." | Fed \| Central Banks |
101
+ | "Goldman Sachs traders countered the industry's underwriting slump with revenue gains that raced past analysts' estimates." | Company \| Product News |
102
+ | "China Evergrande Group's onshore bond holders rejected a plan by the distressed developer to further extend a bond payment." | Treasuries \| Corporate Debt |
103
+ | "Investing Club: Morgan Stanley's dividend, buyback pay us for our patience after quarterly missteps" | Dividend |
104
+
105
+ ## Training Data
106
+
107
+ - **Dataset:** zeroshot/twitter-financial-news-topic
108
+ - **Size:** 21 107 samples
109
+ - **Class Distribution:** Unbalanced; class weights used during training.
110
+
111
+ ## Training Procedure
112
+
113
+ - **Framework:** HuggingFace Transformers (Trainer API)
114
+ - **Arguments:**
115
+ - **num_train_epochs:** 10
116
+ - **per_device_train_batch_size:** 32
117
+ - **per_device_eval_batch_size:** 32
118
+ - **gradient_accumulation_steps:** 1
119
+ - **learning_rate:** 2e-5
120
+ - **fp16:** True (Native AMP mixed precision)
121
+ - **warmup_ratio:** 0.1
122
+ - **label_smoothing_factor:** 0.05
123
+ - **max_grad_norm:** 1.0
124
+ - **max_length:** 256
125
+ - **evaluation_strategy:** "steps"
126
+ - **save_strategy:** "steps"
127
+ - **save_total_limit:** 3
128
+ - **load_best_model_at_end:** True
129
+ - **metric_for_best_model:** "f1"
130
+ - **run_name:** "topic_classifier"
131
+ - **seed:** 42
132
+
133
+ - **Early Stopping:** Patience of 2 evaluation steps (via `EarlyStoppingCallback`)
134
+ - **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
135
+ - **Scheduler:** Linear
136
+ - **Metrics:** F1 (for best model selection), plus accuracy, precision, recall
137
+
138
+ ## Evaluation Results
139
+
140
+ | Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
141
+ |------|---------------|----------------|----------|-----------|--------|------|
142
+ | 530 | 1.965800 | 0.917674 | 0.805684 | 0.743887 | 0.691372 | 0.696721 |
143
+ | 1060 | 0.733100 | 0.684078 | 0.876366 | 0.815078 | 0.823771 | 0.817982 |
144
+ | 1590 | 0.512200 | 0.638335 | 0.895312 | 0.895471 | 0.893691 | 0.893341 |
145
+ | 2120 | 0.418200 | 0.682780 | 0.894826 | 0.880995 | 0.885067 | 0.880227 |
146
+ | 2650 | 0.380200 | 0.683890 | 0.902113 | 0.890379 | 0.901867 | 0.894882 |
147
+ | 3180 | 0.359500 | 0.696923 | 0.902599 | 0.881292 | 0.902299 | 0.888526 |
148
+ | 3710 | 0.348800 | 0.691665 | 0.906000 | 0.891074 | 0.902236 | 0.895001 |
149
+ | 4240 | 0.342900 | 0.687194 | 0.906728 | 0.896421 | 0.900574 | 0.896865 |
150
+ | 4770 | 0.339900 | 0.705139 | 0.904785 | 0.892559 | 0.903573 | 0.896804 |
151
+ | 5300 | 0.337400 | 0.697512 | 0.907943 | 0.897653 | 0.903964 | 0.899527 |
152
+
153
+ ## ONNX Export
154
+
155
+ An ONNX version of this model is available in the [onnx/](./onnx/) directory for use with high-performance inference engines such as Infinity.
156
+
157
+ ## License
158
+
159
+ MIT
160
+
161
+ ## Inspired by:
162
+ - [nickmuchi/finbert-tone-finetuned-finance-topic-classification](https://huggingface.co/nickmuchi/finbert-tone-finetuned-finance-topic-classification/blob/main/README.md)
163
+
164
+ ---
165
+ **References:**
166
+
167
+ - [fuchenru/Trading-Hero-LLM](https://huggingface.co/fuchenru/Trading-Hero-LLM/blob/main/README.md)
config.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "leonas5555/finnews-topic-single-classify",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "Analyst Update",
13
+ "1": "Fed | Central Banks",
14
+ "2": "Company | Product News",
15
+ "3": "Treasuries | Corporate Debt",
16
+ "4": "Dividend",
17
+ "5": "Earnings",
18
+ "6": "Energy | Oil",
19
+ "7": "Financials",
20
+ "8": "Currencies",
21
+ "9": "General News | Opinion",
22
+ "10": "Gold | Metals | Materials",
23
+ "11": "IPO",
24
+ "12": "Legal | Regulation",
25
+ "13": "M&A | Investments",
26
+ "14": "Macro",
27
+ "15": "Markets",
28
+ "16": "Politics",
29
+ "17": "Personnel Change",
30
+ "18": "Stock Commentary",
31
+ "19": "Stock Movement"
32
+ },
33
+ "initializer_range": 0.02,
34
+ "intermediate_size": 3072,
35
+ "label2id": {
36
+ "Analyst Update": 0,
37
+ "Fed | Central Banks": 1,
38
+ "Company | Product News": 2,
39
+ "Treasuries | Corporate Debt": 3,
40
+ "Dividend": 4,
41
+ "Earnings": 5,
42
+ "Energy | Oil": 6,
43
+ "Financials": 7,
44
+ "Currencies": 8,
45
+ "General News | Opinion": 9,
46
+ "Gold | Metals | Materials": 10,
47
+ "IPO": 11,
48
+ "Legal | Regulation": 12,
49
+ "M&A | Investments": 13,
50
+ "Macro": 14,
51
+ "Markets": 15,
52
+ "Politics": 16,
53
+ "Personnel Change": 17,
54
+ "Stock Commentary": 18,
55
+ "Stock Movement": 19
56
+ },
57
+ "layer_norm_eps": 1e-12,
58
+ "max_position_embeddings": 512,
59
+ "model_type": "bert",
60
+ "num_attention_heads": 12,
61
+ "num_hidden_layers": 12,
62
+ "pad_token_id": 0,
63
+ "position_embedding_type": "absolute",
64
+ "problem_type": "single_label_classification",
65
+ "torch_dtype": "float32",
66
+ "transformers_version": "4.51.3",
67
+ "type_vocab_size": 2,
68
+ "use_cache": true,
69
+ "vocab_size": 30873
70
+ }
logs/events.out.tfevents.1747565125.a8173a591e7c.19.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81d1c20bee5ab2c0d926e253084e36a4d781e0d3aaf6fbf2592c75843805e8c4
3
+ size 12711
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cce36979f172a9dc688001bf796cba2b4dd992219aa74240a0072e8ce44fdd55
3
+ size 439092288
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "2": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "3": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "4": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "5": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 256,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3ca33398265d2410bde3efc80097ab837d17ee96f818294084e1f81ededae0e
3
+ size 5304
vocab.txt ADDED
The diff for this file is too large to render. See raw diff