---
license: apache-2.0
tags:
- fraud-detection
- anti-money-laundering
- lightgbm
- graph-neural-network
- tabular
- tier-1-scorer
- fraudsentinel
- explainable-ai
library_name: lightgbm
datasets:
- pointe77/credit-card-transaction
- eexzzm/IBM-Transactions-for-Anti-Money-Laundering-HI-Small-Trans
---

# FraudSentinel — Tier-1 Real-Time Scorers (LightGBM + GNN)

Fast statistical scorers for the **Tier-1** of the FraudSentinel two-tier fraud-detection architecture. They score **100% of the transaction stream in single-digit milliseconds** and **route only flagged cases** to the Tier-2 fine-tuned LLM ([`naazimsnh02/fraudsentinel-qwen3-14b-lora`](https://huggingface.co/naazimsnh02/fraudsentinel-qwen3-14b-lora)) for explanation, typology classification, recommended action, and SAR drafting.

> Two-tier pattern follows published financial-crime systems: a fast model triages every transaction; the LLM explains the flagged minority (arXiv:2507.14785, 2210.14360, 2312.13896).

---

## 📊 Models Overview

| Artifact | Task | Dataset | Metrics | Status |
|---|---|---|---|---|
| `cc_lgbm_model.txt` + `cc_lgbm_preproc.joblib` | Card-Not-Present (CNP) Fraud | Sparkov (1.3M tx) | **PR-AUC 0.967 · ROC-AUC 0.999** | ✅ Complete |
| `aml_lgbm_model.txt` + `aml_lgbm_preproc.joblib` | AML Pre-Filter (Tabular) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.822 · PR-AUC 0.023** | ✅ Complete |
| `aml_gnn.pt` | Money Laundering (Graph) | IBM AML HI-Small (5M tx) | **ROC-AUC 0.584 · PR-AUC 0.0036** | ✅ Complete |

---

## 🎯 Card Fraud Scorer

**Evaluated on Sparkov's held-out test split at natural fraud rate (0.39%):**

| Metric | Value |
|---|---|
| **PR-AUC** | 0.967 |
| **ROC-AUC** | 0.999 |
| **Train rows** | 1,296,675 |
| **Test rows** | 555,719 |
| **Test fraud rate** | 0.387% |
| **Best iteration** | 810 |
| **Scale pos weight** | 171.8× |

### Routing Performance (Test Split)

| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 80% | 0.997 | 0.995 | 0.31% |
| 85% | 0.987 | 0.982 | 0.33% |
| **90% (default)** | **0.940** | **0.964** | **0.36%** |
| 95% | 0.212 | 0.829 | 0.44% |

At the default routing threshold (recall ≈ 0.90), the scorer flags **<0.4%** of all card traffic to the Tier-2 LLM while catching 90% of fraud.

### Top Features (by gain)

1. `amt_to_p95` — transaction amount relative to per-category 95th percentile
2. `category` — merchant category code
3. `log_amt` — log-transformed transaction amount
4. `is_night` — off-hours indicator (10 PM–4 AM)
5. `amt_24h` — rolling 24-hour spend per card
6. `mins_since_last` — time since previous transaction on same card
7. `state` — cardholder state
8. `age` — cardholder age

### Engineered Signals
- Per-category amount anomaly (amount vs. 95th percentile)
- 1-hour and 24-hour velocity (transaction count + spend)
- Geo distance (home ↔ merchant, haversine)
- Time-of-day features (hour, day-of-week, is-night)
- Cardholder age from date of birth
- Category historical fraud rate

### Inference
```python
import lightgbm as lgb
import joblib

preproc = joblib.load("cc_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="cc_lgbm_model.txt")

# featurize using the same pipeline as cc_lgbm.py
# route_to_llm = score >= 0.9403985330442168  (recall-0.90 threshold)
```

---

## 🧮 AML Pre-Filter (Tabular)

Single-transaction LightGBM scorer. Deployed as a high-recall pre-filter or as a standalone lightweight scorer for resource-constrained environments.

| Metric | Value |
|---|---|
| **ROC-AUC** | 0.822 |
| **PR-AUC** | 0.023 |
| **Train rows** | 4,062,676 |
| **Test rows** | 1,015,669 |
| **Test laundering rate** | 0.177% |
| **Best iteration** | 671 |
| **Scale pos weight** | 1,201× |

### Routing Performance (Test Split)

| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 50% | 6.0e-23 | 0.014 | 6.2% |
| 60% | 1.4e-40 | 0.011 | 9.3% |
| 70% | 1.7e-66 | 0.008 | 15.3% |
| **80% (default)** | **1.8e-116** | **0.005** | **31.4%** |

The extreme threshold values reflect the model's calibration at very low fraud prevalence; the operating point is chosen by recall target, not threshold magnitude.

### Top Features (by gain)

1. `rcv_in_deg` — receiver account in-degree (fan-in)
2. `snd_in_deg` — sender account in-degree
3. `snd_out_deg` — sender account out-degree (fan-out)
4. `snd_in_cnt` — sender inbound transaction count
5. `self_loop` — self-transfer indicator
6. `hour` — transaction hour
7. `rcv_in_cnt` — receiver inbound transaction count
8. `snd_out_cnt` — sender outbound transaction count

### Engineered Signals
- Sender/receiver in-degree and out-degree (graph connectivity, fit on train only)
- Self-loop detection
- Currency mismatch flag
- Round-number amount indicator
- Same-bank transfer flag
- Gather-scatter indicator (accounts with high in-degree **and** high out-degree)
- Log-transformed amounts (paid and received)
- Time features (hour, day of week)

### Inference
```python
import lightgbm as lgb
import joblib

preproc = joblib.load("aml_lgbm_preproc.joblib")
model   = lgb.Booster(model_file="aml_lgbm_model.txt")

# apply featurize() from aml_lgbm.py with preproc graph dictionaries
# route_to_llm = score >= routing_threshold from aml_lgbm_metrics.json
```

---

## 🕸️ AML GNN Scorer

Edge-classification Graph Neural Network that scores **transactions as edges** in the inter-bank transfer multigraph. Captures multi-hop laundering patterns that are invisible to single-transaction models.

| Metric | Value |
|---|---|
| **ROC-AUC** | 0.584 |
| **PR-AUC** | 0.0036 |
| **Best F1** | 0.0159 @ threshold 1.0 |
| **Architecture** | 3× GINEConv, hidden dim 96, bidirectional message passing |
| **Training** | 80 epochs, temporal 70/10/20 split, class-weighted loss (~1,245:1) |
| **Edge features** | log(amount_paid), log(amount_received), hour/23, dow/6, currency_mismatch, self_loop, payment_format (one-hot) |
| **Node features** | log(in-degree), log(out-degree) computed on train edges only |

### Why Graph Structure Matters

Single-transaction tabular models are bounded by per-transaction features. Money laundering is a **multi-hop graph pattern** — fan-out, gather-scatter, and cycle-based structuring are only visible when the full transaction network is modeled. The GNN acts as a high-recall graph triage layer that routes suspicious subgraphs to the Tier-2 LLM for investigator-facing explanation.

### Routing Thresholds (Test Split)

| Recall Target | Threshold | Precision | Flagged % |
|---|---|---|---|
| 70% | 0.527 | 0.0019 | 69.4% |
| 80% | 0.377 | 0.0019 | 76.2% |
| 90% | 0.003 | 0.0018 | 89.3% |

### Inference
```python
import torch
from torch_geometric.nn import GINEConv

# Load weights
state = torch.load("aml_gnn.pt", map_location="cpu")
# Rebuild EdgeGNN as defined in train_gnn_aml.py
# model.load_state_dict(state)
# probs = F.softmax(model(x, mp_edge_index, mp_edge_attr, edge_index, edge_attr), dim=1)[:, 1]
```

---

## ⚖️ Limitations

- Source data is synthetic/semi-synthetic. The card scorer's strong metrics reflect the Sparkov generator's structure; validate on your own data before any production deployment.
- The AML GNN is a high-recall graph triage tool, not a production-validated detector. Pair it with human-in-the-loop review and the Tier-2 LLM for investigator-grade precision.
- The AML tabular pre-filter uses graph degree features fit on the training partition. In a streaming deployment, these must be maintained as rolling aggregate state.
- No model in this repository should be used for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.

---

## 📄 License

Apache-2.0. Source datasets retain their own licenses.