FraudSentinel — Qwen3-14B Merged (Tier-2 Intelligence Layer)

Full 16-bit merged model — Qwen3-14B base with the FraudSentinel LoRA adapter weights merged in. Suitable for direct deployment without PEFT or adapter management overhead. Part of the FraudSentinel two-tier platform.

If you prefer the compact LoRA adapter for hot-swapping or multi-tenant deployments, see naazimsnh02/fraudsentinel-qwen3-14b-lora.


Capabilities

This model is trained to act as an enterprise fraud detection and AML investigation assistant. It handles six task types:

  • Structured JSON risk scoring — calibrated score (0.0–1.0), risk level, typology, key signals, feature importance, recommended action, SAR rationale
  • Explainable alerts — evidence-grounded investigator-facing explanations tied to actual transaction features
  • Typology classification — card-not-present fraud, account takeover, fan-out, gather-scatter, structuring, smurfing, and more
  • 6-level recommended actionAUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW
  • SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review
  • Multi-turn HITL dialogue — investigator follow-up conversations with the model
  • Deep Analysis mode — Chain-of-Thought reasoning via Qwen3's thinking tokens

Training Details

Property Value
Base model unsloth/Qwen3-14B (Apache-2.0)
Fine-tuning method SFT + LoRA, merged to full bf16 weights
LoRA rank / alpha 16 / 32
Target modules All linear layers (q, k, v, o projections + MLP gate/up/down)
Trainable parameters (pre-merge) 64,225,280 (0.433% of 14.83B)
Dataset naazimsnh02/fraud-financial-crime-qwen3-sft-v2 (11,016 train examples)
Epochs 2
Total steps 1,378
Effective batch size 16 (2 per device × 8 gradient accumulation)
Learning rate 1e-4 (cosine decay, 5% warmup)
Optimizer AdamW 8-bit
Precision bfloat16
Max sequence length 4,096
Hardware AMD MI300X, 192 GB VRAM, ROCm 7.0
Framework Unsloth 2026.6.1, TRL 0.22.2
Train loss (final) 0.2467
Training time 70.5 min
Peak VRAM 39.8 GB (20.8% of 192 GB)

Usage

Transformers (no PEFT required)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "naazimsnh02/fraudsentinel-qwen3-14b-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-merged")

Unsloth (2× faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "naazimsnh02/fraudsentinel-qwen3-14b-merged",
    max_seq_length = 4096,
    dtype = torch.bfloat16,
    load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

vLLM (production serving)

vllm serve naazimsnh02/fraudsentinel-qwen3-14b-merged \
  --dtype bfloat16 \
  --max-model-len 4096

Inference Example

messages = [
    {"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
    {"role": "user", "content": (
        "Analyze this AML transaction and return a structured JSON risk assessment.\n\n"
        "Transfer: amount_paid=95000 USD, amount_received=94850 EUR, payment_format=ACH, "
        "sender_out_degree=47, sender_in_degree=3, receiver_in_degree=52, "
        "ccy_mismatch=True, is_round=False, is_laundering=True"
    )},
]

# Fast mode — thinking OFF (default for Tier-2 triage)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Deep Analysis mode (Chain-of-Thought for complex or high-stakes cases):

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,   # activates Qwen3 thinking tokens, adds ~3–5 s latency
)

Output Schema (Structured Task)

{
  "risk_score": 0.91,
  "risk_level": "CRITICAL",
  "conclusion": "SUSPICIOUS",
  "primary_typology": "layering / fan-in gather-scatter",
  "secondary_typology": "rapid_passthrough",
  "key_signals": [
    "high_receiver_in_degree",
    "cross_currency_conversion",
    "ach_channel_over_representation"
  ],
  "explanation": "Sender account shows unusually low inbound activity (in-degree 3) relative to high outbound fan-out (47 unique counterparties). Receiver account aggregates from 52 sources — consistent with layering...",
  "feature_importance": {
    "high_receiver_in_degree": 0.41,
    "cross_currency_conversion": 0.33,
    "ach_channel_over_representation": 0.26
  },
  "recommended_action": "SAR_REVIEW",
  "sar_required": true,
  "sar_rationale": "Transaction exhibits layering indicators — high-degree aggregation, cross-currency conversion, and ACH over-representation consistent with structuring."
}

System Prompt

The model was trained with the following system prompt pattern:

You are FraudSentinel, an expert fraud detection and AML investigation assistant.

Consistent use of this system prompt at inference produces the most coherent structured outputs and action recommendations.


Limitations

  • Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
  • AI-generated SAR drafts require human review and edit before filing with FinCEN.
  • The model was trained with thinking mode OFF. Enable it at inference for Deep Analysis; expect 3–5 s additional latency per response.
  • Feature importance values reflect deterministic heuristics from the training data pipeline, not SHAP or gradient-based model explanations.
  • The model is 14B parameters at bfloat16 (~28 GB). A GPU with at least 40 GB VRAM is required for full-precision inference; 4-bit quantization can reduce this to ~10 GB with some quality tradeoff.

License

Apache-2.0 (Qwen3 base model and fine-tuning adapter).

Downloads last month
-
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naazimsnh02/fraudsentinel-qwen3-14b-merged

Finetuned
Qwen/Qwen3-14B
Finetuned
(134)
this model

Dataset used to train naazimsnh02/fraudsentinel-qwen3-14b-merged

Collection including naazimsnh02/fraudsentinel-qwen3-14b-merged