--- license: apache-2.0 tags: - fraud-detection - anti-money-laundering - lightgbm - graph-neural-network - tabular - tier-1-scorer - fraudsentinel - explainable-ai library_name: lightgbm datasets: - pointe77/credit-card-transaction - eexzzm/IBM-Transactions-for-Anti-Money-Laundering-HI-Small-Trans --- # FraudSentinel โ€” Tier-1 Real-Time Scorers (LightGBM + GNN) Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection architecture. They score **100% of the transaction stream in single-digit milliseconds** and **route only flagged cases** to the Tier-2 fine-tuned LLM ([`naazimsnh02/fraudsentinel-qwen3-14b-lora`](https://huggingface.co/naazimsnh02/fraudsentinel-qwen3-14b-lora)) for explanation, typology classification, recommended action, and SAR drafting. > Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896). --- ## ๐Ÿ“Š Models Overview | Artifact | Task | Dataset | Metrics | Status | |---|---|---|---|---| | `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 ยท ROC-AUC 0.999** | โœ… Complete | | `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.822 ยท PR-AUC 0.023** | โœ… Complete | | `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 ยท PR-AUC 0.0036** | โœ… Complete | --- ## ๐ŸŽฏ Card Fraud Scorer **Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):** | Metric | Value | |---|---| | **PR-AUC** | 0.967 | | **ROC-AUC** | 0.999 | | **Train rows** | 1,296,675 | | **Test rows** | 555,719 | | **Test fraud rate** | 0.387% | | **Best iteration** | 810 | | **Scale pos weight** | 171.8ร— | ### Routing Performance (Test Split) | Recall Target | Threshold | Precision | Flagged % | |---|---|---|---| | 80% | 0.997 | 0.995 | 0.31% | | 85% | 0.987 | 0.982 | 0.33% | | **90% (default)** | **0.940** | **0.964** | **0.36%** | | 95% | 0.212 | 0.829 | 0.44% | At the default routing threshold (recall โ‰ˆ 0.90), the scorer flags **<0.4%** of all card traffic to the Tier-2 LLM while catching 90% of fraud. ### Top Features (by gain) 1. `amt_to_p95` โ€” transaction amount relative to per-category 95th percentile 2. `category` โ€” merchant category code 3. `log_amt` โ€” log-transformed transaction amount 4. `is_night` โ€” off-hours indicator (10 PMโ€“4 AM) 5. `amt_24h` โ€” rolling 24-hour spend per card 6. `mins_since_last` โ€” time since previous transaction on same card 7. `state` โ€” cardholder state 8. `age` โ€” cardholder age ### Engineered Signals - Per-category amount anomaly (amount vs. 95th percentile) - 1-hour and 24-hour velocity (transaction count + spend) - Geo distance (home โ†” merchant, haversine) - Time-of-day features (hour, day-of-week, is-night) - Cardholder age from date of birth - Category historical fraud rate ### Inference ```python import lightgbm as lgb import joblib preproc = joblib.load("cc_lgbm_preproc.joblib") model = lgb.Booster(model_file="cc_lgbm_model.txt") # featurize using the same pipeline as cc_lgbm.py # route_to_llm = score >= 0.9403985330442168 (recall-0.90 threshold) ``` --- ## ๐Ÿงฎ AML Pre-Filter (Tabular) Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments. | Metric | Value | |---|---| | **ROC-AUC** | 0.822 | | **PR-AUC** | 0.023 | | **Train rows** | 4,062,676 | | **Test rows** | 1,015,669 | | **Test laundering rate** | 0.177% | | **Best iteration** | 671 | | **Scale pos weight** | 1,201ร— | ### Routing Performance (Test Split) | Recall Target | Threshold | Precision | Flagged % | |---|---|---|---| | 50% | 6.0e-23 | 0.014 | 6.2% | | 60% | 1.4e-40 | 0.011 | 9.3% | | 70% | 1.7e-66 | 0.008 | 15.3% | | **80% (default)** | **1.8e-116** | **0.005** | **31.4%** | The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude. ### Top Features (by gain) 1. `rcv_in_deg` โ€” receiver account in-degree (fan-in) 2. `snd_in_deg` โ€” sender account in-degree 3. `snd_out_deg` โ€” sender account out-degree (fan-out) 4. `snd_in_cnt` โ€” sender inbound transaction count 5. `self_loop` โ€” self-transfer indicator 6. `hour` โ€” transaction hour 7. `rcv_in_cnt` โ€” receiver inbound transaction count 8. `snd_out_cnt` โ€” sender outbound transaction count ### Engineered Signals - Sender/receiver in-degree and out-degree (graph connectivity, fit on train only) - Self-loop detection - Currency mismatch flag - Round-number amount indicator - Same-bank transfer flag - Gather-scatter indicator (accounts with high in-degree **and** high out-degree) - Log-transformed amounts (paid and received) - Time features (hour, day of week) ### Inference ```python import lightgbm as lgb import joblib preproc = joblib.load("aml_lgbm_preproc.joblib") model = lgb.Booster(model_file="aml_lgbm_model.txt") # apply featurize() from aml_lgbm.py with preproc graph dictionaries # route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json ``` --- ## ๐Ÿ•ธ๏ธ AML GNN Scorer Edge-classification Graph Neural Network that scores **transactions as edges** in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models. | Metric | Value | |---|---| | **ROC-AUC** | 0.584 | | **PR-AUC** | 0.0036 | | **Best F1** | 0.0159 @ threshold 1.0 | | **Architecture** | 3ร— GINEConv, hidden dim 96, bidirectional message passing | | **Training** | 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1) | | **Edge features** | log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot) | | **Node features** | log(in-degree), log(out-degree) computed on train edges only | ### Why Graph Structure Matters Single-transaction tabular models are bounded by per-transaction features. Money laundering is a **multi-hop graph pattern** โ€” fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation. ### Routing Thresholds (Test Split) | Recall Target | Threshold | Precision | Flagged % | |---|---|---|---| | 70% | 0.527 | 0.0019 | 69.4% | | 80% | 0.377 | 0.0019 | 76.2% | | 90% | 0.003 | 0.0018 | 89.3% | ### Inference ```python import torch from torch_geometric.nn import GINEConv # Load weights state = torch.load("aml_gnn.pt", map_location="cpu") # Rebuild EdgeGNN as defined in train_gnn_aml.py # model.load_state_dict(state) # probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1] ``` --- ## โš–๏ธ Limitations - Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment. - The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision. - The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state. - No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls. --- ## ๐Ÿ“„ License Apache-2.0. Source datasets retain their own licenses.