Multi-Turn Audit & API Efficiency Analysis
Date: 2026-02-02
Test: scripts/multi_turn_audit.py
Server: http://127.0.0.1:8004
1. Audit Results β
| Metric |
Result |
| Exit Code |
0 (SUCCESS) |
| Turns Completed |
4+ |
| UPI Extraction |
β
PASS |
| Memory Aggregation |
β
PASS |
| Phishing Links |
β
Detected |
| IFSC Codes |
β
Detected |
Sample Replies (Human-Like Hinglish)
| Turn |
Bot Reply |
Realism |
| 1 |
na.. bas ek minute main check karu? |
β
Authentic |
| 2 |
(UPI extraction turn) emoji π΄ |
β
Elderly persona |
| 3 |
arre.. wait. |
β
Natural hesitation |
2. API Call Analysis Per Message
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SINGLE MESSAGE PROCESSING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 1: SAFEGUARD CHECK (API-1) β
β llm_client.check_safeguard() β
β β‘ Blocks prompt injection β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 2: PARALLEL DETECTION & EXTRACTION β
β βββββββββββββββββββββ¬ββββββββββββββββββββ β
β β scam_detector β intel_extractor β β
β β .detect() β .extract() β β
β β (API-2 MAYBE) β (API-3 MAYBE) β β
β βββββββββββββββββββββ΄ββββββββββββββββββββ β
β β‘ FAST-PATH: Skips LLM if regex > 0.85 β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 3: ADAPTIVE BEHAVIOR ANALYSIS β
β adaptive_agent.analyze_scammer_behavior() β
β (Local, NO API) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 4: PERSONA SELECTION β
β persona_engine.select_persona() β
β (Local mapping, NO API) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 5: RESPONSE GENERATION (API-4) β
β persona_engine.generate_response() β
β β
FAST_CHAT role (llama-3.1-8b-instant) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 6: ENRICHMENT (BACKGROUND, API-5) β
β enrichment_service.enrich_intelligence() β
β (Compound system, async) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
β STEP 7: XAI REASONING (CONDITIONAL) β
β xai_explainer.generate_explanation() β
β (Only if ENABLE_LLM_RESPONSES=true) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. API Call Count Summary
| Scenario |
API Calls |
Reason |
| Best Case (FAST-PATH) |
2 |
Regex confident β skip scam LLM, skip intel LLM β only safeguard + reply |
| Typical Case |
3-4 |
Safeguard + Reply + 1-2 extraction/detection |
| Worst Case |
5-6 |
All LLMs engaged + enrichment + XAI |
Optimization Flags
| Flag |
Effect |
| FAST-PATH (line 233) |
Skips LLM Detection if regex confidence > 0.85 |
| Regex-First (line 45) |
Intel extraction starts with local regex |
| Parallel asyncio.gather (line 207) |
Detection + Extraction run concurrently |
| ENABLE_LLM_DETECTION |
Can disable LLM entirely for speed |
4. Decision Flow: Think-Before-Reply β
SCAM MESSAGE ARRIVES
β
βΌ
βββββββββββββββββββββββ
β 1. DETECT SCAM TYPE β β THINK
β 2. EXTRACT INTEL β β THINK
β 3. ANALYZE BEHAVIOR β β THINK
β 4. SELECT PERSONA β β THINK
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β 5. GENERATE REPLY β β ACT
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β 6. ENRICH (ASYNC) β β POST-PROCESS
β 7. LOG + CALLBACK β
βββββββββββββββββββββββ
Conclusion: System THINKS before replying. Intelligence is extracted BEFORE response generation, allowing the reply to incorporate extracted data (personas, scam type, keywords).
5. Model Switching Analysis
| Component |
Primary Model |
Fallback |
Switch Trigger |
| Safeguard |
gpt-oss-safeguard-20b |
N/A (mandatory) |
- |
| Scam Detection |
Regex FAST-PATH |
llama-3.1-8b-instant |
confidence < 0.85 |
| Intel Extraction |
Regex patterns |
generate_verified |
needs semantic context |
| Response Gen |
llama-3.1-8b-instant |
llama-3.3-70b-versatile |
context > 8K or failure |
| Enrichment |
groq/compound |
groq/compound-mini |
latency priority |
6. Wasteful API Calls? β NO
Optimizations Already Present:
- FAST-PATH: Regex > 0.85 skips LLM detection entirely (scam_detector.py:233)
- Parallel Execution: Detection + Extraction run concurrently (orchestrator.py:207)
- Regex-First Intel: Local patterns run before LLM (intelligence_extractor.py:45)
- Conditional LLM: Only calls LLM if
ENABLE_LLM_DETECTION=true and is_available
- Background Enrichment: Doesn't block reply (async)
Cost Per Message:
- Minimum: 2 API calls (safeguard + reply)
- Average: 3-4 API calls
- Maximum: 6 API calls (fully analyzed high-risk message)
7. Reply Realism Verification β
| Feature |
Implementation |
| Hinglish Mixing |
na.., arre.., karu? in replies |
| Human Hesitation |
Ellipsis (...), short phrases |
| Typos |
TypingSimulator adds intentional errors |
| Emoji Use |
π΄ elderly persona marker |
| Delayed Response |
AsyncIO latency simulation |
| Filler Words |
hmm, okay, wait injected |
Summary
| Metric |
Status |
| Multi-turn Memory |
β
Working |
| API Efficiency |
β
Optimized (2-4 calls typical) |
| Model Switching |
β
Working via FAST-PATH |
| Think-Before-Reply |
β
Yes (THINK β ACT flow) |
| Reply Realism |
β
Hinglish + Typos + Hesitation |
| Wasteful Calls |
β None detected |