Update Readme.md
Browse files
README.md
CHANGED
|
@@ -17,23 +17,19 @@ datasets:
|
|
| 17 |
|
| 18 |
# FraudSentinel โ Tier-1 Real-Time Scorers (LightGBM + GNN)
|
| 19 |
|
| 20 |
-
Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection
|
| 21 |
-
architecture. They score **100% of the transaction stream in single-digit milliseconds** and
|
| 22 |
-
**route only flagged/borderline cases** to the Tier-2 fine-tuned LLM
|
| 23 |
-
([`naazimsnh02/fraud-financial-crime-qwen3-sft-v2`](https://huggingface.co/datasets/naazimsnh02/fraud-financial-crime-qwen3-sft-v2))
|
| 24 |
-
for explanation, typology, recommended action, and SAR drafting.
|
| 25 |
|
| 26 |
-
> Two-tier pattern follows published financial-crime systems: a fast model triages every
|
| 27 |
-
> transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).
|
| 28 |
|
|
|
|
| 29 |
|
| 30 |
## ๐ Models Overview
|
| 31 |
|
| 32 |
| Artifact | Task | Dataset | Metrics | Status |
|
| 33 |
|---|---|---|---|---|
|
| 34 |
-
| `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 ยท ROC-AUC 0.999** | โ
|
|
|
|
| 35 |
| `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 ยท PR-AUC 0.0036** | โ
Complete |
|
| 36 |
-
| `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.82 ยท PR-AUC 0.023** | โ
Complete |
|
| 37 |
|
| 38 |
---
|
| 39 |
|
|
@@ -41,76 +37,169 @@ for explanation, typology, recommended action, and SAR drafting.
|
|
| 41 |
|
| 42 |
**Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):**
|
| 43 |
|
| 44 |
-
|
| 45 |
-
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
-
|
|
|
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
-
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
### Why Graph Structure Matters
|
| 72 |
-
|
| 73 |
-
- **
|
| 74 |
-
- **Why modest precision:** This is a baseline GNN, not the full IBM Multi-GNN. Use as high-recall graph triage that routes suspicious sub-graphs to Tier-2 LLM + human review
|
| 75 |
|
| 76 |
### Routing Thresholds (Test Split)
|
|
|
|
| 77 |
| Recall Target | Threshold | Precision | Flagged % |
|
| 78 |
|---|---|---|---|
|
| 79 |
| 70% | 0.527 | 0.0019 | 69.4% |
|
| 80 |
| 80% | 0.377 | 0.0019 | 76.2% |
|
| 81 |
| 90% | 0.003 | 0.0018 | 89.3% |
|
| 82 |
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
|
|
|
|
|
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
-
##
|
| 96 |
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
# if out["route_to_llm"]: send the case to the Tier-2 LLM for explanation + SAR draft
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
Card transactions are scored in well under 10 ms on CPU. The GNN scores the AML graph in a single
|
| 105 |
-
batched forward pass on GPU (or CPU for small graphs); `route_to_llm` uses the recall-calibrated
|
| 106 |
-
threshold stored in the metrics JSON โ tune it to your false-positive budget.
|
| 107 |
|
|
|
|
| 108 |
|
| 109 |
-
##
|
| 110 |
-
Prototype/research use. Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect
|
| 111 |
-
the Sparkov generator's structure; validate on your own data before deployment. The AML GNN is a
|
| 112 |
-
high-recall graph triage, not a compliant detector โ pair it with human-in-the-loop review and (optionally)
|
| 113 |
-
the full Multi-GNN recipe for production-grade precision.
|
| 114 |
|
| 115 |
-
## License
|
| 116 |
Apache-2.0. Source datasets retain their own licenses.
|
|
|
|
| 17 |
|
| 18 |
# FraudSentinel โ Tier-1 Real-Time Scorers (LightGBM + GNN)
|
| 19 |
|
| 20 |
+
Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection architecture. They score **100% of the transaction stream in single-digit milliseconds** and **route only flagged cases** to the Tier-2 fine-tuned LLM ([`naazimsnh02/fraudsentinel-qwen3-14b-lora`](https://huggingface.co/naazimsnh02/fraudsentinel-qwen3-14b-lora)) for explanation, typology classification, recommended action, and SAR drafting.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
> Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).
|
|
|
|
| 23 |
|
| 24 |
+
---
|
| 25 |
|
| 26 |
## ๐ Models Overview
|
| 27 |
|
| 28 |
| Artifact | Task | Dataset | Metrics | Status |
|
| 29 |
|---|---|---|---|---|
|
| 30 |
+
| `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 ยท ROC-AUC 0.999** | โ
Complete |
|
| 31 |
+
| `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.822 ยท PR-AUC 0.023** | โ
Complete |
|
| 32 |
| `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 ยท PR-AUC 0.0036** | โ
Complete |
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
| 37 |
|
| 38 |
**Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):**
|
| 39 |
|
| 40 |
+
| Metric | Value |
|
| 41 |
+
|---|---|
|
| 42 |
+
| **PR-AUC** | 0.967 |
|
| 43 |
+
| **ROC-AUC** | 0.999 |
|
| 44 |
+
| **Train rows** | 1,296,675 |
|
| 45 |
+
| **Test rows** | 555,719 |
|
| 46 |
+
| **Test fraud rate** | 0.387% |
|
| 47 |
+
| **Best iteration** | 810 |
|
| 48 |
+
| **Scale pos weight** | 171.8ร |
|
| 49 |
+
|
| 50 |
+
### Routing Performance (Test Split)
|
| 51 |
+
|
| 52 |
+
| Recall Target | Threshold | Precision | Flagged % |
|
| 53 |
+
|---|---|---|---|
|
| 54 |
+
| 80% | 0.997 | 0.995 | 0.31% |
|
| 55 |
+
| 85% | 0.987 | 0.982 | 0.33% |
|
| 56 |
+
| **90% (default)** | **0.940** | **0.964** | **0.36%** |
|
| 57 |
+
| 95% | 0.212 | 0.829 | 0.44% |
|
| 58 |
+
|
| 59 |
+
At the default routing threshold (recall โ 0.90), the scorer flags **<0.4%** of all card traffic to the Tier-2 LLM while catching 90% of fraud.
|
| 60 |
+
|
| 61 |
+
### Top Features (by gain)
|
| 62 |
+
|
| 63 |
+
1. `amt_to_p95` โ transaction amount relative to per-category 95th percentile
|
| 64 |
+
2. `category` โ merchant category code
|
| 65 |
+
3. `log_amt` โ log-transformed transaction amount
|
| 66 |
+
4. `is_night` โ off-hours indicator (10 PMโ4 AM)
|
| 67 |
+
5. `amt_24h` โ rolling 24-hour spend per card
|
| 68 |
+
6. `mins_since_last` โ time since previous transaction on same card
|
| 69 |
+
7. `state` โ cardholder state
|
| 70 |
+
8. `age` โ cardholder age
|
| 71 |
+
|
| 72 |
+
### Engineered Signals
|
| 73 |
+
- Per-category amount anomaly (amount vs. 95th percentile)
|
| 74 |
+
- 1-hour and 24-hour velocity (transaction count + spend)
|
| 75 |
+
- Geo distance (home โ merchant, haversine)
|
| 76 |
+
- Time-of-day features (hour, day-of-week, is-night)
|
| 77 |
+
- Cardholder age from date of birth
|
| 78 |
+
- Category historical fraud rate
|
| 79 |
+
|
| 80 |
+
### Inference
|
| 81 |
+
```python
|
| 82 |
+
import lightgbm as lgb
|
| 83 |
+
import joblib
|
| 84 |
+
|
| 85 |
+
preproc = joblib.load("cc_lgbm_preproc.joblib")
|
| 86 |
+
model = lgb.Booster(model_file="cc_lgbm_model.txt")
|
| 87 |
+
|
| 88 |
+
# featurize using the same pipeline as cc_lgbm.py
|
| 89 |
+
# route_to_llm = score >= 0.9403985330442168 (recall-0.90 threshold)
|
| 90 |
+
```
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
+
## ๐งฎ AML Pre-Filter (Tabular)
|
| 95 |
+
|
| 96 |
+
Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.
|
| 97 |
+
|
| 98 |
+
| Metric | Value |
|
| 99 |
+
|---|---|
|
| 100 |
+
| **ROC-AUC** | 0.822 |
|
| 101 |
+
| **PR-AUC** | 0.023 |
|
| 102 |
+
| **Train rows** | 4,062,676 |
|
| 103 |
+
| **Test rows** | 1,015,669 |
|
| 104 |
+
| **Test laundering rate** | 0.177% |
|
| 105 |
+
| **Best iteration** | 671 |
|
| 106 |
+
| **Scale pos weight** | 1,201ร |
|
| 107 |
+
|
| 108 |
+
### Routing Performance (Test Split)
|
| 109 |
+
|
| 110 |
+
| Recall Target | Threshold | Precision | Flagged % |
|
| 111 |
+
|---|---|---|---|
|
| 112 |
+
| 50% | 6.0e-23 | 0.014 | 6.2% |
|
| 113 |
+
| 60% | 1.4e-40 | 0.011 | 9.3% |
|
| 114 |
+
| 70% | 1.7e-66 | 0.008 | 15.3% |
|
| 115 |
+
| **80% (default)** | **1.8e-116** | **0.005** | **31.4%** |
|
| 116 |
+
|
| 117 |
+
The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.
|
| 118 |
+
|
| 119 |
+
### Top Features (by gain)
|
| 120 |
+
|
| 121 |
+
1. `rcv_in_deg` โ receiver account in-degree (fan-in)
|
| 122 |
+
2. `snd_in_deg` โ sender account in-degree
|
| 123 |
+
3. `snd_out_deg` โ sender account out-degree (fan-out)
|
| 124 |
+
4. `snd_in_cnt` โ sender inbound transaction count
|
| 125 |
+
5. `self_loop` โ self-transfer indicator
|
| 126 |
+
6. `hour` โ transaction hour
|
| 127 |
+
7. `rcv_in_cnt` โ receiver inbound transaction count
|
| 128 |
+
8. `snd_out_cnt` โ sender outbound transaction count
|
| 129 |
+
|
| 130 |
+
### Engineered Signals
|
| 131 |
+
- Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
|
| 132 |
+
- Self-loop detection
|
| 133 |
+
- Currency mismatch flag
|
| 134 |
+
- Round-number amount indicator
|
| 135 |
+
- Same-bank transfer flag
|
| 136 |
+
- Gather-scatter indicator (accounts with high in-degree **and** high out-degree)
|
| 137 |
+
- Log-transformed amounts (paid and received)
|
| 138 |
+
- Time features (hour, day of week)
|
| 139 |
+
|
| 140 |
+
### Inference
|
| 141 |
+
```python
|
| 142 |
+
import lightgbm as lgb
|
| 143 |
+
import joblib
|
| 144 |
|
| 145 |
+
preproc = joblib.load("aml_lgbm_preproc.joblib")
|
| 146 |
+
model = lgb.Booster(model_file="aml_lgbm_model.txt")
|
| 147 |
|
| 148 |
+
# apply featurize() from aml_lgbm.py with preproc graph dictionaries
|
| 149 |
+
# route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
## ๐ธ๏ธ AML GNN Scorer
|
| 155 |
|
| 156 |
+
Edge-classification Graph Neural Network that scores **transactions as edges** in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.
|
| 157 |
+
|
| 158 |
+
| Metric | Value |
|
| 159 |
+
|---|---|
|
| 160 |
+
| **ROC-AUC** | 0.584 |
|
| 161 |
+
| **PR-AUC** | 0.0036 |
|
| 162 |
+
| **Best F1** | 0.0159 @ threshold 1.0 |
|
| 163 |
+
| **Architecture** | 3ร GINEConv, hidden dim 96, bidirectional message passing |
|
| 164 |
+
| **Training** | 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1) |
|
| 165 |
+
| **Edge features** | log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot) |
|
| 166 |
+
| **Node features** | log(in-degree), log(out-degree) computed on train edges only |
|
| 167 |
|
| 168 |
### Why Graph Structure Matters
|
| 169 |
+
|
| 170 |
+
Single-transaction tabular models are bounded by per-transaction features. Money laundering is a **multi-hop graph pattern** โ fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.
|
|
|
|
| 171 |
|
| 172 |
### Routing Thresholds (Test Split)
|
| 173 |
+
|
| 174 |
| Recall Target | Threshold | Precision | Flagged % |
|
| 175 |
|---|---|---|---|
|
| 176 |
| 70% | 0.527 | 0.0019 | 69.4% |
|
| 177 |
| 80% | 0.377 | 0.0019 | 76.2% |
|
| 178 |
| 90% | 0.003 | 0.0018 | 89.3% |
|
| 179 |
|
| 180 |
+
### Inference
|
| 181 |
+
```python
|
| 182 |
+
import torch
|
| 183 |
+
from torch_geometric.nn import GINEConv
|
| 184 |
+
|
| 185 |
+
# Load weights
|
| 186 |
+
state = torch.load("aml_gnn.pt", map_location="cpu")
|
| 187 |
+
# Rebuild EdgeGNN as defined in train_gnn_aml.py
|
| 188 |
+
# model.load_state_dict(state)
|
| 189 |
+
# probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]
|
| 190 |
+
```
|
| 191 |
|
| 192 |
---
|
| 193 |
|
| 194 |
+
## โ๏ธ Limitations
|
| 195 |
|
| 196 |
+
- Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
|
| 197 |
+
- The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
|
| 198 |
+
- The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
|
| 199 |
+
- No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
+
---
|
| 202 |
|
| 203 |
+
## ๐ License
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
|
|
|
| 205 |
Apache-2.0. Source datasets retain their own licenses.
|