Joblib
lightgbm
fraud-detection
anti-money-laundering
graph-neural-network
tabular
tier-1-scorer
fraudsentinel
explainable-ai

FraudSentinel โ€” Tier-1 Real-Time Scorers (LightGBM + GNN)

Fast statistical scorers for the Tier-1 of the FraudSentinel two-tier fraud-detection architecture. They score 100% of the transaction stream in single-digit milliseconds and route only flagged cases to the Tier-2 fine-tuned LLM (naazimsnh02/fraudsentinel-qwen3-14b-lora) for explanation, typology classification, recommended action, and SAR drafting.

Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).


๐Ÿ“Š Models Overview

Artifact Task Dataset Metrics Status
cc_lgbm_model.txt + cc_lgbm_preproc.joblib Card-Not-Present (CNP) Fraud Sparkov (1.3M tx) PR-AUC 0.967 ยท ROC-AUC 0.999 โœ… Complete
aml_lgbm_model.txt + aml_lgbm_preproc.joblib AML Pre-Filter (Tabular) IBM AML HI-Small (5M tx) ROC-AUC 0.822 ยท PR-AUC 0.023 โœ… Complete
aml_gnn.pt Money Laundering (Graph) IBM AML HI-Small (5M tx) ROC-AUC 0.584 ยท PR-AUC 0.0036 โœ… Complete

๐ŸŽฏ Card Fraud Scorer

Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):

Metric Value
PR-AUC 0.967
ROC-AUC 0.999
Train rows 1,296,675
Test rows 555,719
Test fraud rate 0.387%
Best iteration 810
Scale pos weight 171.8ร—

Routing Performance (Test Split)

Recall Target Threshold Precision Flagged %
80% 0.997 0.995 0.31%
85% 0.987 0.982 0.33%
90% (default) 0.940 0.964 0.36%
95% 0.212 0.829 0.44%

At the default routing threshold (recall โ‰ˆ 0.90), the scorer flags <0.4% of all card traffic to the Tier-2 LLM while catching 90% of fraud.

Top Features (by gain)

  1. amt_to_p95 โ€” transaction amount relative to per-category 95th percentile
  2. category โ€” merchant category code
  3. log_amt โ€” log-transformed transaction amount
  4. is_night โ€” off-hours indicator (10 PMโ€“4 AM)
  5. amt_24h โ€” rolling 24-hour spend per card
  6. mins_since_last โ€” time since previous transaction on same card
  7. state โ€” cardholder state
  8. age โ€” cardholder age

Engineered Signals

  • Per-category amount anomaly (amount vs. 95th percentile)
  • 1-hour and 24-hour velocity (transaction count + spend)
  • Geo distance (home โ†” merchant, haversine)
  • Time-of-day features (hour, day-of-week, is-night)
  • Cardholder age from date of birth
  • Category historical fraud rate

Inference

import lightgbm as lgb
import joblib

preproc = joblib.load("cc_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="cc_lgbm_model.txt")

# featurize using the same pipeline as cc_lgbm.py
# route_to_llm = score >= 0.9403985330442168  (recall-0.90 threshold)

๐Ÿงฎ AML Pre-Filter (Tabular)

Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.

Metric Value
ROC-AUC 0.822
PR-AUC 0.023
Train rows 4,062,676
Test rows 1,015,669
Test laundering rate 0.177%
Best iteration 671
Scale pos weight 1,201ร—

Routing Performance (Test Split)

Recall Target Threshold Precision Flagged %
50% 6.0e-23 0.014 6.2%
60% 1.4e-40 0.011 9.3%
70% 1.7e-66 0.008 15.3%
80% (default) 1.8e-116 0.005 31.4%

The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.

Top Features (by gain)

  1. rcv_in_deg โ€” receiver account in-degree (fan-in)
  2. snd_in_deg โ€” sender account in-degree
  3. snd_out_deg โ€” sender account out-degree (fan-out)
  4. snd_in_cnt โ€” sender inbound transaction count
  5. self_loop โ€” self-transfer indicator
  6. hour โ€” transaction hour
  7. rcv_in_cnt โ€” receiver inbound transaction count
  8. snd_out_cnt โ€” sender outbound transaction count

Engineered Signals

  • Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
  • Self-loop detection
  • Currency mismatch flag
  • Round-number amount indicator
  • Same-bank transfer flag
  • Gather-scatter indicator (accounts with high in-degree and high out-degree)
  • Log-transformed amounts (paid and received)
  • Time features (hour, day of week)

Inference

import lightgbm as lgb
import joblib

preproc = joblib.load("aml_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="aml_lgbm_model.txt")

# apply featurize() from aml_lgbm.py with preproc graph dictionaries
# route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json

๐Ÿ•ธ๏ธ AML GNN Scorer

Edge-classification Graph Neural Network that scores transactions as edges in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.

Metric Value
ROC-AUC 0.584
PR-AUC 0.0036
Best F1 0.0159 @ threshold 1.0
Architecture 3ร— GINEConv, hidden dim 96, bidirectional message passing
Training 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1)
Edge features log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot)
Node features log(in-degree), log(out-degree) computed on train edges only

Why Graph Structure Matters

Single-transaction tabular models are bounded by per-transaction features. Money laundering is a multi-hop graph pattern โ€” fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.

Routing Thresholds (Test Split)

Recall Target Threshold Precision Flagged %
70% 0.527 0.0019 69.4%
80% 0.377 0.0019 76.2%
90% 0.003 0.0018 89.3%

Inference

import torch
from torch_geometric.nn import GINEConv

# Load weights
state = torch.load("aml_gnn.pt", map_location="cpu")
# Rebuild EdgeGNN as defined in train_gnn_aml.py
# model.load_state_dict(state)
# probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]

โš–๏ธ Limitations

  • Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
  • The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
  • The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
  • No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.

๐Ÿ“„ License

Apache-2.0. Source datasets retain their own licenses.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Datasets used to train naazimsnh02/fraudsentinel-tier1-scorers

Collection including naazimsnh02/fraudsentinel-tier1-scorers

Paper for naazimsnh02/fraudsentinel-tier1-scorers