FraudSentinel — Qwen3-14B LoRA Adapter (Tier-2 Intelligence Layer)

Fine-tuned LoRA adapter for Qwen3-14B, trained for enterprise fraud detection and financial crime investigation. Part of the FraudSentinel two-tier platform.

For a self-contained deployment without LoRA adapter management, see the merged model: naazimsnh02/fraudsentinel-qwen3-14b-merged.

Capabilities

The model is trained to act as an enterprise fraud and AML investigation assistant across six task types:

Structured JSON risk scoring — calibrated risk score (0.0–1.0), risk level (LOW / MEDIUM / HIGH / CRITICAL), typology, key signals, feature importance, recommended action, and SAR rationale
Explainable alerts — evidence-grounded investigator-facing natural language explanations tied to actual transaction features
Typology classification — primary and secondary fraud/laundering pattern identification (card-not-present, account takeover, fan-out, gather-scatter, structuring, etc.)
6-level recommended action — AUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW
SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review and filing
Multi-turn HITL dialogue — investigator follow-ups ("Why this risk level?", "What else should I check?", "Customer confirmed legit — what next?")
Deep Analysis mode — optional Chain-of-Thought reasoning via Qwen3's thinking tokens for complex multi-account cases

Training Details

Property	Value
Base model	`unsloth/Qwen3-14B` (Apache-2.0)
Method	Supervised Fine-Tuning (SFT) + LoRA
LoRA rank	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (all-linear)
LoRA dropout	0 (Unsloth-optimized)
Trainable parameters	64,225,280 (0.433% of 14.83B total)
Dataset	`naazimsnh02/fraud-financial-crime-qwen3-sft-v2`
Training examples	11,016 (train split)
Epochs	2
Total steps	1,378
Batch size (per device)	2
Gradient accumulation	8 (effective batch size 16)
Learning rate	1e-4
LR scheduler	Cosine
Warmup ratio	0.05
Optimizer	AdamW 8-bit
Precision	bfloat16 (no quantization)
Weight decay	0.001
Max sequence length	4,096
Packing	Disabled (padding-free mode enabled)
Hardware	AMD MI300X (192 GB VRAM)
Framework	Unsloth 2026.6.1, TRL 0.22.2, PEFT 0.19.1, Transformers 4.56.2
ROCm / PyTorch	ROCm 7.0, PyTorch 2.10.0+rocm7.0
Train loss (final)	0.2467
Training time	4,230 s (70.5 min)
Peak VRAM	39.8 GB (20.8% of 192 GB)
LoRA VRAM overhead	12.0 GB (6.3% of max)

Usage

Load with Unsloth (recommended)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "naazimsnh02/fraudsentinel-qwen3-14b-lora",
    max_seq_length = 4096,
    dtype = torch.bfloat16,
    load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

Load with PEFT + Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-14B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "naazimsnh02/fraudsentinel-qwen3-14b-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-lora")

Inference Example

messages = [
    {"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
    {"role": "user", "content": (
        "Analyze this card transaction and return a structured JSON risk assessment.\n\n"
        "Transaction: amount=$828.62, category=misc_net, hour=2, "
        "amount_vs_category_p95=2.16x, tx_24h=4, geo_km=1847, is_fraud=True"
    )},
]

# Thinking mode OFF (fast mode — default for Tier-2 triage)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Deep Analysis mode (Chain-of-Thought for complex cases):

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,   # activates Qwen3 thinking tokens
)

Output Schema (Structured Task)

{
  "risk_score": 0.84,
  "risk_level": "HIGH",
  "conclusion": "FRAUDULENT",
  "primary_typology": "card-not-present account takeover / stolen-card online cash-out",
  "secondary_typology": "account_takeover",
  "key_signals": [
    "amount_exceeds_category_p95",
    "high_risk_merchant_category",
    "unusual_hour_activity"
  ],
  "explanation": "Transaction amount $828.62 exceeds the 95th-percentile for misc_net purchases...",
  "feature_importance": {
    "amount_exceeds_category_p95": 0.46,
    "high_risk_merchant_category": 0.28,
    "unusual_hour_activity": 0.26
  },
  "recommended_action": "AUTO_BLOCK",
  "sar_required": false,
  "sar_rationale": null
}

Limitations

Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
AI-generated SAR drafts require human review and edit before filing.
The model was trained with thinking mode OFF (enable_thinking=False). Enabling thinking mode at inference activates Qwen3's CoT capabilities but adds latency (3–5 s per response).
Feature importance values are deterministic heuristics from the training data generation pipeline, not SHAP or model-derived importances.