FraudSentinel — Tier-1 Real-Time Scorers (LightGBM + GNN)

Fast statistical scorers for the Tier-1 of the FraudSentinel two-tier fraud-detection architecture. They score 100% of the transaction stream in single-digit milliseconds and route only flagged cases to the Tier-2 fine-tuned LLM (naazimsnh02/fraudsentinel-qwen3-14b-lora) for explanation, typology classification, recommended action, and SAR drafting.

Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).

📊 Models Overview

Artifact	Task	Dataset	Metrics	Status
`cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib`	Card-Not-Present (CNP) Fraud	Sparkov (1.3M tx)	PR-AUC 0.967 · ROC-AUC 0.999	✅ Complete
`aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib`	AML Pre-Filter (Tabular)	IBM AML HI-Small (5M tx)	ROC-AUC 0.822 · PR-AUC 0.023	✅ Complete
`aml_gnn.pt`	Money Laundering (Graph)	IBM AML HI-Small (5M tx)	ROC-AUC 0.584 · PR-AUC 0.0036	✅ Complete

🎯 Card Fraud Scorer

Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):

Metric	Value
PR-AUC	0.967
ROC-AUC	0.999
Train rows	1,296,675
Test rows	555,719
Test fraud rate	0.387%
Best iteration	810
Scale pos weight	171.8×

Routing Performance (Test Split)

Recall Target	Threshold	Precision	Flagged %
80%	0.997	0.995	0.31%
85%	0.987	0.982	0.33%
90% (default)	0.940	0.964	0.36%
95%	0.212	0.829	0.44%

At the default routing threshold (recall ≈ 0.90), the scorer flags <0.4% of all card traffic to the Tier-2 LLM while catching 90% of fraud.

Top Features (by gain)

amt_to_p95 — transaction amount relative to per-category 95th percentile
category — merchant category code
log_amt — log-transformed transaction amount
is_night — off-hours indicator (10 PM–4 AM)
amt_24h — rolling 24-hour spend per card
mins_since_last — time since previous transaction on same card
state — cardholder state
age — cardholder age

Engineered Signals

Per-category amount anomaly (amount vs. 95th percentile)
1-hour and 24-hour velocity (transaction count + spend)
Geo distance (home ↔ merchant, haversine)
Time-of-day features (hour, day-of-week, is-night)
Cardholder age from date of birth
Category historical fraud rate

Inference

import lightgbm as lgb
import joblib

preproc = joblib.load("cc_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="cc_lgbm_model.txt")

# featurize using the same pipeline as cc_lgbm.py
# route_to_llm = score >= 0.9403985330442168  (recall-0.90 threshold)

🧮 AML Pre-Filter (Tabular)

Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.

Metric	Value
ROC-AUC	0.822
PR-AUC	0.023
Train rows	4,062,676
Test rows	1,015,669
Test laundering rate	0.177%
Best iteration	671
Scale pos weight	1,201×

Routing Performance (Test Split)

Recall Target	Threshold	Precision	Flagged %
50%	6.0e-23	0.014	6.2%
60%	1.4e-40	0.011	9.3%
70%	1.7e-66	0.008	15.3%
80% (default)	1.8e-116	0.005	31.4%

The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.

Top Features (by gain)

rcv_in_deg — receiver account in-degree (fan-in)
snd_in_deg — sender account in-degree
snd_out_deg — sender account out-degree (fan-out)
snd_in_cnt — sender inbound transaction count
self_loop — self-transfer indicator
hour — transaction hour
rcv_in_cnt — receiver inbound transaction count
snd_out_cnt — sender outbound transaction count

Engineered Signals

Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
Self-loop detection
Currency mismatch flag
Round-number amount indicator
Same-bank transfer flag
Gather-scatter indicator (accounts with high in-degree and high out-degree)
Log-transformed amounts (paid and received)
Time features (hour, day of week)

Inference

import lightgbm as lgb
import joblib

preproc = joblib.load("aml_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="aml_lgbm_model.txt")

# apply featurize() from aml_lgbm.py with preproc graph dictionaries
# route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json

🕸️ AML GNN Scorer

Edge-classification Graph Neural Network that scores transactions as edges in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.

Metric	Value
ROC-AUC	0.584
PR-AUC	0.0036
Best F1	0.0159 @ threshold 1.0
Architecture	3× GINEConv, hidden dim 96, bidirectional message passing
Training	80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1)
Edge features	log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot)
Node features	log(in-degree), log(out-degree) computed on train edges only

Why Graph Structure Matters

Single-transaction tabular models are bounded by per-transaction features. Money laundering is a multi-hop graph pattern — fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.

Routing Thresholds (Test Split)

Recall Target	Threshold	Precision	Flagged %
70%	0.527	0.0019	69.4%
80%	0.377	0.0019	76.2%
90%	0.003	0.0018	89.3%

Inference

import torch
from torch_geometric.nn import GINEConv

# Load weights
state = torch.load("aml_gnn.pt", map_location="cpu")
# Rebuild EdgeGNN as defined in train_gnn_aml.py
# model.load_state_dict(state)
# probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]

⚖️ Limitations

Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.

📄 License

Apache-2.0. Source datasets retain their own licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train naazimsnh02/fraudsentinel-tier1-scorers

Collection including naazimsnh02/fraudsentinel-tier1-scorers

Fraud Sentinel

Collection

5 items • Updated 1 day ago

Paper for naazimsnh02/fraudsentinel-tier1-scorers

Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs

Paper • 2507.14785 • Published Oct 29, 2025