FraudSentinel — Qwen3-14B Merged (Tier-2 Intelligence Layer)

Full 16-bit merged model — Qwen3-14B base with the FraudSentinel LoRA adapter weights merged in. Suitable for direct deployment without PEFT or adapter management overhead. Part of the FraudSentinel two-tier platform.

If you prefer the compact LoRA adapter for hot-swapping or multi-tenant deployments, see naazimsnh02/fraudsentinel-qwen3-14b-lora.

Capabilities

This model is trained to act as an enterprise fraud detection and AML investigation assistant. It handles six task types:

Structured JSON risk scoring — calibrated score (0.0–1.0), risk level, typology, key signals, feature importance, recommended action, SAR rationale
Explainable alerts — evidence-grounded investigator-facing explanations tied to actual transaction features
Typology classification — card-not-present fraud, account takeover, fan-out, gather-scatter, structuring, smurfing, and more
6-level recommended action — AUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW
SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review
Multi-turn HITL dialogue — investigator follow-up conversations with the model
Deep Analysis mode — Chain-of-Thought reasoning via Qwen3's thinking tokens

Training Details

Property	Value
Base model	`unsloth/Qwen3-14B` (Apache-2.0)
Fine-tuning method	SFT + LoRA, merged to full bf16 weights
LoRA rank / alpha	16 / 32
Target modules	All linear layers (q, k, v, o projections + MLP gate/up/down)
Trainable parameters (pre-merge)	64,225,280 (0.433% of 14.83B)
Dataset	`naazimsnh02/fraud-financial-crime-qwen3-sft-v2` (11,016 train examples)
Epochs	2
Total steps	1,378
Effective batch size	16 (2 per device × 8 gradient accumulation)
Learning rate	1e-4 (cosine decay, 5% warmup)
Optimizer	AdamW 8-bit
Precision	bfloat16
Max sequence length	4,096
Hardware	AMD MI300X, 192 GB VRAM, ROCm 7.0
Framework	Unsloth 2026.6.1, TRL 0.22.2
Train loss (final)	0.2467
Training time	70.5 min
Peak VRAM	39.8 GB (20.8% of 192 GB)

Usage

Transformers (no PEFT required)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "naazimsnh02/fraudsentinel-qwen3-14b-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-merged")

Unsloth (2× faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "naazimsnh02/fraudsentinel-qwen3-14b-merged",
    max_seq_length = 4096,
    dtype = torch.bfloat16,
    load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

vLLM (production serving)

vllm serve naazimsnh02/fraudsentinel-qwen3-14b-merged \
  --dtype bfloat16 \
  --max-model-len 4096

Inference Example

messages = [
    {"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
    {"role": "user", "content": (
        "Analyze this AML transaction and return a structured JSON risk assessment.\n\n"
        "Transfer: amount_paid=95000 USD, amount_received=94850 EUR, payment_format=ACH, "
        "sender_out_degree=47, sender_in_degree=3, receiver_in_degree=52, "
        "ccy_mismatch=True, is_round=False, is_laundering=True"
    )},
]

# Fast mode — thinking OFF (default for Tier-2 triage)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Deep Analysis mode (Chain-of-Thought for complex or high-stakes cases):

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,   # activates Qwen3 thinking tokens, adds ~3–5 s latency
)

Output Schema (Structured Task)

{
  "risk_score": 0.91,
  "risk_level": "CRITICAL",
  "conclusion": "SUSPICIOUS",
  "primary_typology": "layering / fan-in gather-scatter",
  "secondary_typology": "rapid_passthrough",
  "key_signals": [
    "high_receiver_in_degree",
    "cross_currency_conversion",
    "ach_channel_over_representation"
  ],
  "explanation": "Sender account shows unusually low inbound activity (in-degree 3) relative to high outbound fan-out (47 unique counterparties). Receiver account aggregates from 52 sources — consistent with layering...",
  "feature_importance": {
    "high_receiver_in_degree": 0.41,
    "cross_currency_conversion": 0.33,
    "ach_channel_over_representation": 0.26
  },
  "recommended_action": "SAR_REVIEW",
  "sar_required": true,
  "sar_rationale": "Transaction exhibits layering indicators — high-degree aggregation, cross-currency conversion, and ACH over-representation consistent with structuring."
}

System Prompt

The model was trained with the following system prompt pattern:

You are FraudSentinel, an expert fraud detection and AML investigation assistant.

Consistent use of this system prompt at inference produces the most coherent structured outputs and action recommendations.

Limitations

Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
AI-generated SAR drafts require human review and edit before filing with FinCEN.
The model was trained with thinking mode OFF. Enable it at inference for Deep Analysis; expect 3–5 s additional latency per response.
Feature importance values reflect deterministic heuristics from the training data pipeline, not SHAP or gradient-based model explanations.
The model is 14B parameters at bfloat16 (~28 GB). A GPU with at least 40 GB VRAM is required for full-precision inference; 4-bit quantization can reduce this to ~10 GB with some quality tradeoff.