Joblib
lightgbm
fraud-detection
anti-money-laundering
graph-neural-network
tabular
tier-1-scorer
fraudsentinel
explainable-ai
naazimsnh02 commited on
Commit
4fc397a
ยท
verified ยท
1 Parent(s): 7194df9

Update Readme.md

Browse files
Files changed (1) hide show
  1. README.md +147 -58
README.md CHANGED
@@ -17,23 +17,19 @@ datasets:
17
 
18
  # FraudSentinel โ€” Tier-1 Real-Time Scorers (LightGBM + GNN)
19
 
20
- Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection
21
- architecture. They score **100% of the transaction stream in single-digit milliseconds** and
22
- **route only flagged/borderline cases** to the Tier-2 fine-tuned LLM
23
- ([`naazimsnh02/fraud-financial-crime-qwen3-sft-v2`](https://huggingface.co/datasets/naazimsnh02/fraud-financial-crime-qwen3-sft-v2))
24
- for explanation, typology, recommended action, and SAR drafting.
25
 
26
- > Two-tier pattern follows published financial-crime systems: a fast model triages every
27
- > transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).
28
 
 
29
 
30
  ## ๐Ÿ“Š Models Overview
31
 
32
  | Artifact | Task | Dataset | Metrics | Status |
33
  |---|---|---|---|---|
34
- | `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 ยท ROC-AUC 0.999** | โœ… Shipped |
 
35
  | `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 ยท PR-AUC 0.0036** | โœ… Complete |
36
- | `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.82 ยท PR-AUC 0.023** | โœ… Complete |
37
 
38
  ---
39
 
@@ -41,76 +37,169 @@ for explanation, typology, recommended action, and SAR drafting.
41
 
42
  **Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):**
43
 
44
- - **Metrics:** PR-AUC **0.967**, ROC-AUC **0.999**
45
- - **Routing Performance:** At recall โ‰ˆ 0.90, precision 0.964 โ†’ catches **90% of fraud** while flagging only **<0.36%** of traffic to Tier-2 LLM
46
- - **Latency:** <5 ms per transaction on CPU
47
- - **Top Features:**
48
- - Amount vs. category 95th percentile anomaly
49
- - Merchant category
50
- - Log transaction amount
51
- - Time-of-day patterns
52
- - 24h/1h velocity per card
53
- - Time since last transaction
54
- - **Engineered Signals:** Geo distance (home โ†” merchant), cardholder age, category amount anomaly
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ---
57
 
58
- ## ๐Ÿ•ธ๏ธ AML GNN Scorer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- **Graph Neural Network (GINE edge-classifier with edge features)**
 
61
 
62
- Trained on IBM AML transaction multigraph with **temporal 70/10/20 train/val/test split** and class-weighted loss.
 
 
 
 
 
 
63
 
64
- ### Current Metrics
65
- - **ROC-AUC:** 0.584 (on temporal test split)
66
- - **PR-AUC:** 0.0036
67
- - **Best F1:** 0.0159 (@ threshold 1.0)
68
- - **Architecture:** 3ร— GINEConv layers, hidden dim 96, reverse edges (bidirectional message passing)
69
- - **Training:** 80 epochs, best-checkpoint selection, recall-calibrated routing thresholds
 
 
 
 
 
70
 
71
  ### Why Graph Structure Matters
72
- - **Beats tabular baseline:** Tabular LightGBM (ROC-AUC 0.82 โ†’ **0.584** GNN) confirms graph structure carries laundering patterns invisible to per-transaction models
73
- - **Multi-hop detection:** Catches fan-out, gather-scatter, cycle-based laundering patterns
74
- - **Why modest precision:** This is a baseline GNN, not the full IBM Multi-GNN. Use as high-recall graph triage that routes suspicious sub-graphs to Tier-2 LLM + human review
75
 
76
  ### Routing Thresholds (Test Split)
 
77
  | Recall Target | Threshold | Precision | Flagged % |
78
  |---|---|---|---|
79
  | 70% | 0.527 | 0.0019 | 69.4% |
80
  | 80% | 0.377 | 0.0019 | 76.2% |
81
  | 90% | 0.003 | 0.0018 | 89.3% |
82
 
83
- ---
84
- ## ๐Ÿงฎ AML Pre-Filter (Tabular Baseline)
85
-
86
- Single-transaction LightGBM scorer for deployment on resource-constrained systems or as Tier-1 pre-filter.
87
-
88
- - **Metrics:** ROC-AUC **0.82**, PR-AUC **0.023**
89
- - **Features:** Amount, currency mismatch, payment format, time-of-day, velocity
90
- - **Latency:** <1 ms per transaction on CPU
91
- - **Purpose:** High-recall triage or lightweight deployment before GNN
 
 
92
 
93
  ---
94
 
95
- ## Inference
96
 
97
- ```python
98
- from infer import CardScorer, AMLScorer
99
- cs = CardScorer() # loads model + preprocessor + routing threshold
100
- out = cs.score(transaction_dict) # -> {"risk": 0.97, "route_to_llm": True, "tier": "card"}
101
- # if out["route_to_llm"]: send the case to the Tier-2 LLM for explanation + SAR draft
102
- ```
103
-
104
- Card transactions are scored in well under 10 ms on CPU. The GNN scores the AML graph in a single
105
- batched forward pass on GPU (or CPU for small graphs); `route_to_llm` uses the recall-calibrated
106
- threshold stored in the metrics JSON โ€” tune it to your false-positive budget.
107
 
 
108
 
109
- ## Limitations
110
- Prototype/research use. Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect
111
- the Sparkov generator's structure; validate on your own data before deployment. The AML GNN is a
112
- high-recall graph triage, not a compliant detector โ€” pair it with human-in-the-loop review and (optionally)
113
- the full Multi-GNN recipe for production-grade precision.
114
 
115
- ## License
116
  Apache-2.0. Source datasets retain their own licenses.
 
17
 
18
  # FraudSentinel โ€” Tier-1 Real-Time Scorers (LightGBM + GNN)
19
 
20
+ Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection architecture. They score **100% of the transaction stream in single-digit milliseconds** and **route only flagged cases** to the Tier-2 fine-tuned LLM ([`naazimsnh02/fraudsentinel-qwen3-14b-lora`](https://huggingface.co/naazimsnh02/fraudsentinel-qwen3-14b-lora)) for explanation, typology classification, recommended action, and SAR drafting.
 
 
 
 
21
 
22
+ > Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).
 
23
 
24
+ ---
25
 
26
  ## ๐Ÿ“Š Models Overview
27
 
28
  | Artifact | Task | Dataset | Metrics | Status |
29
  |---|---|---|---|---|
30
+ | `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 ยท ROC-AUC 0.999** | โœ… Complete |
31
+ | `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.822 ยท PR-AUC 0.023** | โœ… Complete |
32
  | `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 ยท PR-AUC 0.0036** | โœ… Complete |
 
33
 
34
  ---
35
 
 
37
 
38
  **Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):**
39
 
40
+ | Metric | Value |
41
+ |---|---|
42
+ | **PR-AUC** | 0.967 |
43
+ | **ROC-AUC** | 0.999 |
44
+ | **Train rows** | 1,296,675 |
45
+ | **Test rows** | 555,719 |
46
+ | **Test fraud rate** | 0.387% |
47
+ | **Best iteration** | 810 |
48
+ | **Scale pos weight** | 171.8ร— |
49
+
50
+ ### Routing Performance (Test Split)
51
+
52
+ | Recall Target | Threshold | Precision | Flagged % |
53
+ |---|---|---|---|
54
+ | 80% | 0.997 | 0.995 | 0.31% |
55
+ | 85% | 0.987 | 0.982 | 0.33% |
56
+ | **90% (default)** | **0.940** | **0.964** | **0.36%** |
57
+ | 95% | 0.212 | 0.829 | 0.44% |
58
+
59
+ At the default routing threshold (recall โ‰ˆ 0.90), the scorer flags **<0.4%** of all card traffic to the Tier-2 LLM while catching 90% of fraud.
60
+
61
+ ### Top Features (by gain)
62
+
63
+ 1. `amt_to_p95` โ€” transaction amount relative to per-category 95th percentile
64
+ 2. `category` โ€” merchant category code
65
+ 3. `log_amt` โ€” log-transformed transaction amount
66
+ 4. `is_night` โ€” off-hours indicator (10 PMโ€“4 AM)
67
+ 5. `amt_24h` โ€” rolling 24-hour spend per card
68
+ 6. `mins_since_last` โ€” time since previous transaction on same card
69
+ 7. `state` โ€” cardholder state
70
+ 8. `age` โ€” cardholder age
71
+
72
+ ### Engineered Signals
73
+ - Per-category amount anomaly (amount vs. 95th percentile)
74
+ - 1-hour and 24-hour velocity (transaction count + spend)
75
+ - Geo distance (home โ†” merchant, haversine)
76
+ - Time-of-day features (hour, day-of-week, is-night)
77
+ - Cardholder age from date of birth
78
+ - Category historical fraud rate
79
+
80
+ ### Inference
81
+ ```python
82
+ import lightgbm as lgb
83
+ import joblib
84
+
85
+ preproc = joblib.load("cc_lgbm_preproc.joblib")
86
+ model = lgb.Booster(model_file="cc_lgbm_model.txt")
87
+
88
+ # featurize using the same pipeline as cc_lgbm.py
89
+ # route_to_llm = score >= 0.9403985330442168 (recall-0.90 threshold)
90
+ ```
91
 
92
  ---
93
 
94
+ ## ๐Ÿงฎ AML Pre-Filter (Tabular)
95
+
96
+ Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.
97
+
98
+ | Metric | Value |
99
+ |---|---|
100
+ | **ROC-AUC** | 0.822 |
101
+ | **PR-AUC** | 0.023 |
102
+ | **Train rows** | 4,062,676 |
103
+ | **Test rows** | 1,015,669 |
104
+ | **Test laundering rate** | 0.177% |
105
+ | **Best iteration** | 671 |
106
+ | **Scale pos weight** | 1,201ร— |
107
+
108
+ ### Routing Performance (Test Split)
109
+
110
+ | Recall Target | Threshold | Precision | Flagged % |
111
+ |---|---|---|---|
112
+ | 50% | 6.0e-23 | 0.014 | 6.2% |
113
+ | 60% | 1.4e-40 | 0.011 | 9.3% |
114
+ | 70% | 1.7e-66 | 0.008 | 15.3% |
115
+ | **80% (default)** | **1.8e-116** | **0.005** | **31.4%** |
116
+
117
+ The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.
118
+
119
+ ### Top Features (by gain)
120
+
121
+ 1. `rcv_in_deg` โ€” receiver account in-degree (fan-in)
122
+ 2. `snd_in_deg` โ€” sender account in-degree
123
+ 3. `snd_out_deg` โ€” sender account out-degree (fan-out)
124
+ 4. `snd_in_cnt` โ€” sender inbound transaction count
125
+ 5. `self_loop` โ€” self-transfer indicator
126
+ 6. `hour` โ€” transaction hour
127
+ 7. `rcv_in_cnt` โ€” receiver inbound transaction count
128
+ 8. `snd_out_cnt` โ€” sender outbound transaction count
129
+
130
+ ### Engineered Signals
131
+ - Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
132
+ - Self-loop detection
133
+ - Currency mismatch flag
134
+ - Round-number amount indicator
135
+ - Same-bank transfer flag
136
+ - Gather-scatter indicator (accounts with high in-degree **and** high out-degree)
137
+ - Log-transformed amounts (paid and received)
138
+ - Time features (hour, day of week)
139
+
140
+ ### Inference
141
+ ```python
142
+ import lightgbm as lgb
143
+ import joblib
144
 
145
+ preproc = joblib.load("aml_lgbm_preproc.joblib")
146
+ model = lgb.Booster(model_file="aml_lgbm_model.txt")
147
 
148
+ # apply featurize() from aml_lgbm.py with preproc graph dictionaries
149
+ # route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json
150
+ ```
151
+
152
+ ---
153
+
154
+ ## ๐Ÿ•ธ๏ธ AML GNN Scorer
155
 
156
+ Edge-classification Graph Neural Network that scores **transactions as edges** in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.
157
+
158
+ | Metric | Value |
159
+ |---|---|
160
+ | **ROC-AUC** | 0.584 |
161
+ | **PR-AUC** | 0.0036 |
162
+ | **Best F1** | 0.0159 @ threshold 1.0 |
163
+ | **Architecture** | 3ร— GINEConv, hidden dim 96, bidirectional message passing |
164
+ | **Training** | 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1) |
165
+ | **Edge features** | log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot) |
166
+ | **Node features** | log(in-degree), log(out-degree) computed on train edges only |
167
 
168
  ### Why Graph Structure Matters
169
+
170
+ Single-transaction tabular models are bounded by per-transaction features. Money laundering is a **multi-hop graph pattern** โ€” fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.
 
171
 
172
  ### Routing Thresholds (Test Split)
173
+
174
  | Recall Target | Threshold | Precision | Flagged % |
175
  |---|---|---|---|
176
  | 70% | 0.527 | 0.0019 | 69.4% |
177
  | 80% | 0.377 | 0.0019 | 76.2% |
178
  | 90% | 0.003 | 0.0018 | 89.3% |
179
 
180
+ ### Inference
181
+ ```python
182
+ import torch
183
+ from torch_geometric.nn import GINEConv
184
+
185
+ # Load weights
186
+ state = torch.load("aml_gnn.pt", map_location="cpu")
187
+ # Rebuild EdgeGNN as defined in train_gnn_aml.py
188
+ # model.load_state_dict(state)
189
+ # probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]
190
+ ```
191
 
192
  ---
193
 
194
+ ## โš–๏ธ Limitations
195
 
196
+ - Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
197
+ - The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
198
+ - The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
199
+ - No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
 
 
 
 
 
 
200
 
201
+ ---
202
 
203
+ ## ๐Ÿ“„ License
 
 
 
 
204
 
 
205
  Apache-2.0. Source datasets retain their own licenses.