Instructions to use naazimsnh02/fraudsentinel-qwen3-14b-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use naazimsnh02/fraudsentinel-qwen3-14b-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="naazimsnh02/fraudsentinel-qwen3-14b-merged", max_seq_length=2048, )
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chattingUsing HuggingFace Spaces for Unsloth
# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chattingLoad model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/fraudsentinel-qwen3-14b-merged",
max_seq_length=2048,
)FraudSentinel — Qwen3-14B Merged (Tier-2 Intelligence Layer)
Full 16-bit merged model — Qwen3-14B base with the FraudSentinel LoRA adapter weights merged in. Suitable for direct deployment without PEFT or adapter management overhead. Part of the FraudSentinel two-tier platform.
If you prefer the compact LoRA adapter for hot-swapping or multi-tenant deployments, see naazimsnh02/fraudsentinel-qwen3-14b-lora.
Capabilities
This model is trained to act as an enterprise fraud detection and AML investigation assistant. It handles six task types:
- Structured JSON risk scoring — calibrated score (0.0–1.0), risk level, typology, key signals, feature importance, recommended action, SAR rationale
- Explainable alerts — evidence-grounded investigator-facing explanations tied to actual transaction features
- Typology classification — card-not-present fraud, account takeover, fan-out, gather-scatter, structuring, smurfing, and more
- 6-level recommended action —
AUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW - SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review
- Multi-turn HITL dialogue — investigator follow-up conversations with the model
- Deep Analysis mode — Chain-of-Thought reasoning via Qwen3's thinking tokens
Training Details
| Property | Value |
|---|---|
| Base model | unsloth/Qwen3-14B (Apache-2.0) |
| Fine-tuning method | SFT + LoRA, merged to full bf16 weights |
| LoRA rank / alpha | 16 / 32 |
| Target modules | All linear layers (q, k, v, o projections + MLP gate/up/down) |
| Trainable parameters (pre-merge) | 64,225,280 (0.433% of 14.83B) |
| Dataset | naazimsnh02/fraud-financial-crime-qwen3-sft-v2 (11,016 train examples) |
| Epochs | 2 |
| Total steps | 1,378 |
| Effective batch size | 16 (2 per device × 8 gradient accumulation) |
| Learning rate | 1e-4 (cosine decay, 5% warmup) |
| Optimizer | AdamW 8-bit |
| Precision | bfloat16 |
| Max sequence length | 4,096 |
| Hardware | AMD MI300X, 192 GB VRAM, ROCm 7.0 |
| Framework | Unsloth 2026.6.1, TRL 0.22.2 |
| Train loss (final) | 0.2467 |
| Training time | 70.5 min |
| Peak VRAM | 39.8 GB (20.8% of 192 GB) |
Usage
Transformers (no PEFT required)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"naazimsnh02/fraudsentinel-qwen3-14b-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-merged")
Unsloth (2× faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "naazimsnh02/fraudsentinel-qwen3-14b-merged",
max_seq_length = 4096,
dtype = torch.bfloat16,
load_in_4bit = False,
)
FastLanguageModel.for_inference(model)
vLLM (production serving)
vllm serve naazimsnh02/fraudsentinel-qwen3-14b-merged \
--dtype bfloat16 \
--max-model-len 4096
Inference Example
messages = [
{"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
{"role": "user", "content": (
"Analyze this AML transaction and return a structured JSON risk assessment.\n\n"
"Transfer: amount_paid=95000 USD, amount_received=94850 EUR, payment_format=ACH, "
"sender_out_degree=47, sender_in_degree=3, receiver_in_degree=52, "
"ccy_mismatch=True, is_round=False, is_laundering=True"
)},
]
# Fast mode — thinking OFF (default for Tier-2 triage)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Deep Analysis mode (Chain-of-Thought for complex or high-stakes cases):
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True, # activates Qwen3 thinking tokens, adds ~3–5 s latency
)
Output Schema (Structured Task)
{
"risk_score": 0.91,
"risk_level": "CRITICAL",
"conclusion": "SUSPICIOUS",
"primary_typology": "layering / fan-in gather-scatter",
"secondary_typology": "rapid_passthrough",
"key_signals": [
"high_receiver_in_degree",
"cross_currency_conversion",
"ach_channel_over_representation"
],
"explanation": "Sender account shows unusually low inbound activity (in-degree 3) relative to high outbound fan-out (47 unique counterparties). Receiver account aggregates from 52 sources — consistent with layering...",
"feature_importance": {
"high_receiver_in_degree": 0.41,
"cross_currency_conversion": 0.33,
"ach_channel_over_representation": 0.26
},
"recommended_action": "SAR_REVIEW",
"sar_required": true,
"sar_rationale": "Transaction exhibits layering indicators — high-degree aggregation, cross-currency conversion, and ACH over-representation consistent with structuring."
}
System Prompt
The model was trained with the following system prompt pattern:
You are FraudSentinel, an expert fraud detection and AML investigation assistant.
Consistent use of this system prompt at inference produces the most coherent structured outputs and action recommendations.
Limitations
- Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
- AI-generated SAR drafts require human review and edit before filing with FinCEN.
- The model was trained with thinking mode OFF. Enable it at inference for Deep Analysis; expect 3–5 s additional latency per response.
- Feature importance values reflect deterministic heuristics from the training data pipeline, not SHAP or gradient-based model explanations.
- The model is 14B parameters at bfloat16 (~28 GB). A GPU with at least 40 GB VRAM is required for full-precision inference; 4-bit quantization can reduce this to ~10 GB with some quality tradeoff.
License
Apache-2.0 (Qwen3 base model and fine-tuning adapter).
- Downloads last month
- -
Install Unsloth Studio (macOS, Linux, WSL)
# Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-merged to start chatting