How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="wincode/kerala-crime-detective-gemma")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("wincode/kerala-crime-detective-gemma")
model = AutoModelForCausalLM.from_pretrained("wincode/kerala-crime-detective-gemma")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

🔍 Kerala Crime Detective — Malayalam + English + Manglish AI

Solve crimes the Kerala way — Comedy, Manglish, and serious detective work, all in one model!

A fine-tuned Gemma 3 1B that understands Kerala crime reports, FIR details, and cyber fraud cases — and responds in Malayalam, English, or Manglish with comedy and serious investigation steps.


🎯 What This Model Does

Mode Description
🎭 Malayalam Comedy Solves crimes with Manglish humor, Kerala cultural references, local jokes
🔍 Serious English Professional CID-style investigation — evidence, suspects, legal sections
🌐 Cyber Crime Expert Specialized in UPI fraud, SIM swap, sextortion, fake jobs, investment scams
🎭+🔍 Mixed Style Comedy + serious advice combined — most popular mode

🚀 Try the Live Demo

👉 Open in HuggingFace Spaces


📦 Model Details

Property Value
Base Model google/gemma-3-1b-it
Fine-tuning Method Supervised Fine-Tuning (SFT) with TRL
Training Framework HuggingFace TRL + Transformers
Hardware Kaggle T4 GPU
Languages Malayalam, English, Manglish
Parameters ~1 Billion
Precision bfloat16
License Apache 2.0

💻 Quick Start

Installation

pip install transformers accelerate torch

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "wincode/kerala-crime-detective-gemma"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="eager"
)
model.eval()

def solve_crime(crime_report: str, mode: str = "comedy") -> str:
    system_prompts = {
        "comedy": (
            "You are Kerala Nadodikattu Detective, a comedy crime solver. "
            "Solve crimes using Malayalam, Manglish and English mix. "
            "Use humor, local references, Kerala culture."
        ),
        "serious": (
            "You are a senior Kerala Police CID officer. "
            "Analyze crime reports professionally with evidence analysis, "
            "suspect profiling, investigation steps and legal sections."
        ),
        "cyber": (
            "You are Kerala Cyber Cell's top investigator. "
            "Specialize in UPI fraud, SIM swap, sextortion, fake jobs. "
            "Provide immediate victim steps, recovery options, legal recourse."
        ),
    }

    messages = [
        {"role": "system", "content": system_prompts.get(mode, system_prompts["comedy"])},
        {"role": "user",   "content": crime_report},
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=400,
            do_sample=True,
            temperature=0.8,
            top_p=0.9,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.pad_token_id,
        )

    new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True)


# ── Example 1: Malayalam Comedy Mode ─────────────────────────
report1 = """
Crime Report:
Location: Thrissur Pooram grounds
Time: 11 PM
Crime: 2kg gold ornaments missing from elephant caparison
Evidence: Footprints, torn dhoti piece
FIR: THR/2024/445
"""
print(solve_crime(report1, mode="comedy"))


# ── Example 2: Cyber Crime Mode ───────────────────────────────
report2 = """
Cyber Crime FIR:
Victim: Anitha, Homemaker, Palakkad
Crime: Received WhatsApp message saying I won Rs 50 lakh lottery.
They asked Rs 15,000 processing fee. I paid. Number now switched off.
Amount lost: Rs 15,000
"""
print(solve_crime(report2, mode="cyber"))


# ── Example 3: Serious Investigation Mode ─────────────────────
report3 = """
Hit and Run:
Location: NH-66 Kannur Bypass
Time: 11:30 PM
Victim: Biker, critical, ICU
Evidence: White paint transfer on victim bike
Witness: Trucker saw partial plate KL-13
"""
print(solve_crime(report3, mode="serious"))

🗂️ Training Dataset

Trained on the Kerala Crime Comedy Dataset — a custom dataset covering:

Category Examples
🎭 Malayalam Comedy Crime Gold theft, car theft, chain snatching, bicycle theft
💻 Cyber Crimes Investment fraud, SIM swap, sextortion, OLX scam, fake jobs
🔍 Serious Crime Solving Murder investigation, drug bust, hit and run
🏠 Property Crime House breaking, land disputes
💙 Support Cases Domestic violence, senior citizen fraud, missing persons
🌊 Environmental Crime Illegal sand mining
💰 Financial Fraud Microfinance harassment, lottery fraud
😄 Comedy Light Cases Christmas star theft, goat eating crops, road rage

Dataset stats:

  • Total examples: 21+ (growing)
  • Languages: Malayalam, English, Manglish
  • Format: Chat (system + user + assistant)
  • Average response length: 300-500 tokens

📊 Example Outputs

🎭 Malayalam Comedy Mode

Input:

Crime: Gold chain snatching at Calicut beach. Victim: teacher. Suspect: young male on motorcycle.

Output:

⚡ Chain Snatching Investigation Report!

Chechi! Chain poyi aano?
Bike-il vannu, chain eduthu, bike-il poyi!
Classic Calicut evening crime!

🔍 Evidence Analysis:
- Skid marks: Direction analysis possible
- Time: Evening = witnesses available
- Victim is teacher: Respected community member

Serious Steps:
✅ Alert gold shops in 20km radius IMMEDIATELY
✅ Deploy two-wheeler patrol on beach road
✅ CCTV from SM Street main junction
✅ Recovery chance: 60% if acted within 6 hours!

Chechi, we will get justice! 💪

🌐 Cyber Crime Expert Mode

Input:

UPI fraud: Someone sent QR code saying it will send me money. I scanned and entered PIN. Rs 35,000 deducted.

Output:

💸 UPI QR Scam - Classic Trap!

CALL 1930 RIGHT NOW (Cyber Crime Helpline - Free, 24/7)
Every minute counts for account freeze!

How the scam worked:
In UPI there are TWO types of QR codes:
📥 COLLECT request = You are PAYING (needs PIN = money GOES OUT)
📤 PAYMENT = You are RECEIVING (NO PIN needed ever)

GOLDEN RULE: You NEVER enter PIN to RECEIVE money!
If someone asks PIN to send you money = 100% SCAM always!

Recovery steps:
1. Call 1930 immediately
2. Report to your bank fraud line
3. Screenshot the UPI ID and report on cybercrime.gov.in

Recovery probability: 50-60% if reported within 4 hours!

⚠️ Limitations

  • Model is fine-tuned on a small dataset (21 examples) — responses may not always be perfectly formatted
  • Malayalam script quality depends on base model's multilingual capability
  • For real emergencies, always contact actual Kerala Police: 100 or Cyber Crime: 1930
  • Model provides educational and entertainment value — not a substitute for real legal advice
  • Responses may vary due to sampling temperature

🛡️ Important Disclaimer

This model is for educational and entertainment purposes only.

For real crimes and emergencies:

  • Police Emergency: 100
  • Cyber Crime Helpline: 1930
  • Women's Helpline: 1091
  • Child Helpline: 1098
  • Cybercrime Portal: cybercrime.gov.in

🏋️ Training Details

# Fine-tuning configuration used
sft_config = SFTConfig(
    max_length=1024,
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,   # effective batch = 16
    gradient_checkpointing=True,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    weight_decay=0.01,
    bf16=True,
    optim="adamw_torch_fused",
)

Hardware used: Kaggle T4 GPU (15GB VRAM) Training time: ~25 minutes for 5 epochs


🗺️ Roadmap

  • Expand dataset to 500+ examples
  • Add more Malayalam script examples
  • Add Manglish-only mode
  • Support for audio input (voice crime reports)
  • Add more cyber crime patterns (2024-2025 new scams)
  • Quantized version (GGUF) for local deployment
  • API endpoint for police department integration

📁 Related Resources


🙏 Credits

  • Base Model: Google Gemma 3 — Thank you Google DeepMind
  • Fine-tuning: HuggingFace TRL — SFTTrainer
  • Training Platform: Kaggle — Free T4 GPU
  • Demo Framework: Gradio
  • Inspiration: Kerala Police, Kerala comedy films, and every aunty who knows everything 🙏

📜 License

This model is released under the Apache 2.0 License.

The base model Gemma 3 is subject to Google's Gemma Terms of Use.


Made with ❤️ in Kerala 🌴 | Nammude Kerala, Nammude Detective!

Downloads last month
19
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with wincode/kerala-crime-detective-gemma.

Model tree for wincode/kerala-crime-detective-gemma

Finetuned
(553)
this model

Space using wincode/kerala-crime-detective-gemma 1