naazimsnh02/fraud-financial-crime-qwen3-sft-v2
Viewer • Updated • 11.8k • 28
How to use naazimsnh02/fraudsentinel-qwen3-14b-lora with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-lora to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-lora to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for naazimsnh02/fraudsentinel-qwen3-14b-lora to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/fraudsentinel-qwen3-14b-lora",
max_seq_length=2048,
)Fine-tuned LoRA adapter for Qwen3-14B, trained for enterprise fraud detection and financial crime investigation. Part of the FraudSentinel two-tier platform.
For a self-contained deployment without LoRA adapter management, see the merged model: naazimsnh02/fraudsentinel-qwen3-14b-merged.
The model is trained to act as an enterprise fraud and AML investigation assistant across six task types:
AUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW| Property | Value |
|---|---|
| Base model | unsloth/Qwen3-14B (Apache-2.0) |
| Method | Supervised Fine-Tuning (SFT) + LoRA |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (all-linear) |
| LoRA dropout | 0 (Unsloth-optimized) |
| Trainable parameters | 64,225,280 (0.433% of 14.83B total) |
| Dataset | naazimsnh02/fraud-financial-crime-qwen3-sft-v2 |
| Training examples | 11,016 (train split) |
| Epochs | 2 |
| Total steps | 1,378 |
| Batch size (per device) | 2 |
| Gradient accumulation | 8 (effective batch size 16) |
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Optimizer | AdamW 8-bit |
| Precision | bfloat16 (no quantization) |
| Weight decay | 0.001 |
| Max sequence length | 4,096 |
| Packing | Disabled (padding-free mode enabled) |
| Hardware | AMD MI300X (192 GB VRAM) |
| Framework | Unsloth 2026.6.1, TRL 0.22.2, PEFT 0.19.1, Transformers 4.56.2 |
| ROCm / PyTorch | ROCm 7.0, PyTorch 2.10.0+rocm7.0 |
| Train loss (final) | 0.2467 |
| Training time | 4,230 s (70.5 min) |
| Peak VRAM | 39.8 GB (20.8% of 192 GB) |
| LoRA VRAM overhead | 12.0 GB (6.3% of max) |
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "naazimsnh02/fraudsentinel-qwen3-14b-lora",
max_seq_length = 4096,
dtype = torch.bfloat16,
load_in_4bit = False,
)
FastLanguageModel.for_inference(model)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-14B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "naazimsnh02/fraudsentinel-qwen3-14b-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-lora")
messages = [
{"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
{"role": "user", "content": (
"Analyze this card transaction and return a structured JSON risk assessment.\n\n"
"Transaction: amount=$828.62, category=misc_net, hour=2, "
"amount_vs_category_p95=2.16x, tx_24h=4, geo_km=1847, is_fraud=True"
)},
]
# Thinking mode OFF (fast mode — default for Tier-2 triage)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Deep Analysis mode (Chain-of-Thought for complex cases):
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True, # activates Qwen3 thinking tokens
)
{
"risk_score": 0.84,
"risk_level": "HIGH",
"conclusion": "FRAUDULENT",
"primary_typology": "card-not-present account takeover / stolen-card online cash-out",
"secondary_typology": "account_takeover",
"key_signals": [
"amount_exceeds_category_p95",
"high_risk_merchant_category",
"unusual_hour_activity"
],
"explanation": "Transaction amount $828.62 exceeds the 95th-percentile for misc_net purchases...",
"feature_importance": {
"amount_exceeds_category_p95": 0.46,
"high_risk_merchant_category": 0.28,
"unusual_hour_activity": 0.26
},
"recommended_action": "AUTO_BLOCK",
"sar_required": false,
"sar_rationale": null
}
enable_thinking=False). Enabling thinking mode at inference activates Qwen3's CoT capabilities but adds latency (3–5 s per response).Apache-2.0 (base model and adapter).