FraudSentinel โ Tier-1 Real-Time Scorers (LightGBM + GNN)
Fast statistical scorers for the Tier-1 of the FraudSentinel two-tier fraud-detection architecture. They score 100% of the transaction stream in single-digit milliseconds and route only flagged cases to the Tier-2 fine-tuned LLM (naazimsnh02/fraudsentinel-qwen3-14b-lora) for explanation, typology classification, recommended action, and SAR drafting.
Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).
๐ Models Overview
| Artifact | Task | Dataset | Metrics | Status |
|---|---|---|---|---|
cc_lgbm_model.txt + cc_lgbm_preproc.joblib |
Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | PR-AUC 0.967 ยท ROC-AUC 0.999 | โ Complete |
aml_lgbm_model.txt + aml_lgbm_preproc.joblib |
AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | ROC-AUC 0.822 ยท PR-AUC 0.023 | โ Complete |
aml_gnn.pt |
Money Laundering (Graph) | IBM AML HI-Small (5M tx) | ROC-AUC 0.584 ยท PR-AUC 0.0036 | โ Complete |
๐ฏ Card Fraud Scorer
Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):
| Metric | Value |
|---|---|
| PR-AUC | 0.967 |
| ROC-AUC | 0.999 |
| Train rows | 1,296,675 |
| Test rows | 555,719 |
| Test fraud rate | 0.387% |
| Best iteration | 810 |
| Scale pos weight | 171.8ร |
Routing Performance (Test Split)
| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 80% | 0.997 | 0.995 | 0.31% |
| 85% | 0.987 | 0.982 | 0.33% |
| 90% (default) | 0.940 | 0.964 | 0.36% |
| 95% | 0.212 | 0.829 | 0.44% |
At the default routing threshold (recall โ 0.90), the scorer flags <0.4% of all card traffic to the Tier-2 LLM while catching 90% of fraud.
Top Features (by gain)
amt_to_p95โ transaction amount relative to per-category 95th percentilecategoryโ merchant category codelog_amtโ log-transformed transaction amountis_nightโ off-hours indicator (10 PMโ4 AM)amt_24hโ rolling 24-hour spend per cardmins_since_lastโ time since previous transaction on same cardstateโ cardholder stateageโ cardholder age
Engineered Signals
- Per-category amount anomaly (amount vs. 95th percentile)
- 1-hour and 24-hour velocity (transaction count + spend)
- Geo distance (home โ merchant, haversine)
- Time-of-day features (hour, day-of-week, is-night)
- Cardholder age from date of birth
- Category historical fraud rate
Inference
import lightgbm as lgb
import joblib
preproc = joblib.load("cc_lgbm_preproc.joblib")
model = lgb.Booster(model_file="cc_lgbm_model.txt")
# featurize using the same pipeline as cc_lgbm.py
# route_to_llm = score >= 0.9403985330442168 (recall-0.90 threshold)
๐งฎ AML Pre-Filter (Tabular)
Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.
| Metric | Value |
|---|---|
| ROC-AUC | 0.822 |
| PR-AUC | 0.023 |
| Train rows | 4,062,676 |
| Test rows | 1,015,669 |
| Test laundering rate | 0.177% |
| Best iteration | 671 |
| Scale pos weight | 1,201ร |
Routing Performance (Test Split)
| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 50% | 6.0e-23 | 0.014 | 6.2% |
| 60% | 1.4e-40 | 0.011 | 9.3% |
| 70% | 1.7e-66 | 0.008 | 15.3% |
| 80% (default) | 1.8e-116 | 0.005 | 31.4% |
The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.
Top Features (by gain)
rcv_in_degโ receiver account in-degree (fan-in)snd_in_degโ sender account in-degreesnd_out_degโ sender account out-degree (fan-out)snd_in_cntโ sender inbound transaction countself_loopโ self-transfer indicatorhourโ transaction hourrcv_in_cntโ receiver inbound transaction countsnd_out_cntโ sender outbound transaction count
Engineered Signals
- Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
- Self-loop detection
- Currency mismatch flag
- Round-number amount indicator
- Same-bank transfer flag
- Gather-scatter indicator (accounts with high in-degree and high out-degree)
- Log-transformed amounts (paid and received)
- Time features (hour, day of week)
Inference
import lightgbm as lgb
import joblib
preproc = joblib.load("aml_lgbm_preproc.joblib")
model = lgb.Booster(model_file="aml_lgbm_model.txt")
# apply featurize() from aml_lgbm.py with preproc graph dictionaries
# route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json
๐ธ๏ธ AML GNN Scorer
Edge-classification Graph Neural Network that scores transactions as edges in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.
| Metric | Value |
|---|---|
| ROC-AUC | 0.584 |
| PR-AUC | 0.0036 |
| Best F1 | 0.0159 @ threshold 1.0 |
| Architecture | 3ร GINEConv, hidden dim 96, bidirectional message passing |
| Training | 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1) |
| Edge features | log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot) |
| Node features | log(in-degree), log(out-degree) computed on train edges only |
Why Graph Structure Matters
Single-transaction tabular models are bounded by per-transaction features. Money laundering is a multi-hop graph pattern โ fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.
Routing Thresholds (Test Split)
| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 70% | 0.527 | 0.0019 | 69.4% |
| 80% | 0.377 | 0.0019 | 76.2% |
| 90% | 0.003 | 0.0018 | 89.3% |
Inference
import torch
from torch_geometric.nn import GINEConv
# Load weights
state = torch.load("aml_gnn.pt", map_location="cpu")
# Rebuild EdgeGNN as defined in train_gnn_aml.py
# model.load_state_dict(state)
# probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]
โ๏ธ Limitations
- Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
- The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
- The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
- No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
๐ License
Apache-2.0. Source datasets retain their own licenses.