# Sakhi (सखी) — Judge Brief *One-page version of the README. Full detail in [README.md](README.md). 3-min demo video: [youtu.be/n-u7J1lljUg](https://youtu.be/n-u7J1lljUg).* ## Problem India's 1 million+ ASHA health workers conduct 50M+ maternal and child home visits every year; every visit ends with a hand-filled paper form carried to the PHC. Danger signs observed in the field — preeclampsia, postpartum hemorrhage, neonatal distress — often don't reach the clinical system in time for intervention. ## What Sakhi does Sakhi converts Hindi home-visit conversations (voice on a shared health-center workstation, text on the ASHA's phone offline) into structured NHM/MCTS forms + a function-calling-powered danger-sign triage that flags referrals with verbatim utterance evidence. Same pipeline, same anti-hallucination validation, two deployment modes: Whisper-Large + Gemma 4 E4B via Ollama on a workstation for accuracy, and Gemma 4 E2B via Cactus SDK on an Android phone for offline resilience. ## Numbers a judge can check | Measurement | Value | Source | |---|---|---| | Text extraction pass rate (base Gemma 4 E4B) | **15 / 15** | `scripts/test_ollama_quality.py` — per-case rubric; one under-specified trap documented in [FAILURES.md](FAILURES.md) | | End-to-end audio pipeline pass rate | **13 / 15** | `scripts/test_pipeline_e2e.py` (2 TTS→ASR artifacts, documented in FAILURES.md) | | Hindi number / medical-term normalization | **133 / 133** | `scripts/test_asr.py` | | On-device JS pipeline port (engine-agnostic) | **72 / 72** | `cd frontend && node --test src/lib/__tests__/` | | False-alarm rate on routine visits | **0** | Strict evidence-grounding + 6-layer validation | | Workstation pipeline latency (audio → form) | ~15–25 s | RTX 5070 Ti, warm Ollama | | On-device pipeline latency (Hindi text → form) | ~5 min | OnePlus 11R / Snapdragon 8+ Gen 1, Gemma 4 E2B INT4 on Cactus | The 5-minute on-device figure is reproducible via the **Load ANC example** button in Field Mode (Field Mode tab → On-device text → form card → "Load ANC example"). On OnePlus 11R / Snapdragon 8+ Gen 1, the on-device pipeline extracts BP 155/100, verbatim Hindi symptoms (`सिरदर्द, आँखों के सामने धुंधला दिखना, चेहरे पर सूजन, पैरों में सूजन`), Counseling `PHC जाने की सलाह`, and flags three danger signs — `high_bp_with_symptoms`, `swelling_face`, `swelling_legs` — all with verbatim Hindi `utterance_evidence` and `category: immediate_referral`. Total 320.7 s end-to-end (Form 231.8 s + Danger 88.9 s + normalize + detect). For comparison: the paper-form baseline is 15–20 min of hand-filling plus travel to the PHC. ## Why this is submitted to four tracks | Track | What Sakhi brings | |---|---| | **Health & Sciences** | A clinical-decision-support tool with explicit human-in-the-loop design, 6-layer anti-hallucination, strict-evidence danger-sign grounding, demographics entered as a typed header (the way every clinical EMR does it, so identifiers don't depend on ASR), and a workflow matched to how ASHA workers actually operate (health-center mode + field mode with later sync). | | **Ollama** | Native function calling via `tools=` parameter for `extract_form` + `flag_danger_sign` + `issue_referral` in a single inference pass, quantized Gemma 4 E4B Q4_K_M served on LAN to any phone on the same WiFi. One command (`python api.py`) starts the full stack. | | **Unsloth** | One-command LoRA pipeline (`scripts/train_unsloth.py`): data prep → train → GGUF export → Ollama register → A/B eval vs base. Includes a Windows GGUF-export workaround (`scripts/export_merge.py`) for Unsloth's Gemma 4 mmap failure — manual delta-merge + `llama.cpp/convert_hf_to_gguf.py` + `llama-quantize Q4_K_M`, no WSL needed. Fine-tune pass rate 14/15 vs base 15/15 — base is in the live pipeline; fine-tune is published to Ollama as [`tusharbrisingr9802/sakhi`](https://ollama.com/tusharbrisingr9802/sakhi) (`ollama pull tusharbrisingr9802/sakhi` to verify A/B locally) for deployments preferring English schema-label normalization (`दस्त` → `Diarrhea`) over raw Hindi. Field-coverage diff in `FIELD_COVERAGE_DIFF.md`. | | **Cactus** | On-device integration: custom Capacitor plugin bridging JS ↔ Cactus Kotlin SDK, JS pipeline port that drives either the Cactus engine or the workstation engine through a single `engine.complete()` contract, null-filled instance template prompting pattern that sidesteps E2B INT4's schema-echo failure mode, in-app SAF zip-import so a judge can install the 4.4 GB model without adb or developer tooling (single-pass extract with 1%/heartbeat progress events; auto-evicts stale model dirs on re-import), and a Developer-view toggle that shows raw per-stage model output for verifiable extraction. On-device voice-in via `cactusTranscribe` + Gemma was investigated; the README documents why it's not shipped (Gemma 4 doesn't serve Cactus's ASR path, and off-the-shelf Whisper-Hindi INT4 has 27–70% WER on rural/clinical Hindi per [Kumar et al. 2025](https://arxiv.org/abs/2512.10967) and the Vistaar / Gramvaani benchmarks, with deletion-dominant errors on numbers — not in this submission). | ## Reproduce in under 10 minutes **3-min demo video:** [youtu.be/n-u7J1lljUg](https://youtu.be/n-u7J1lljUg) — workstation voice-to-form path, on-device Hindi text-to-form on a phone in airplane mode, four tracks claimed. **Live demo (no install):** [https://huggingface.co/spaces/Tushar9802/sakhi](https://huggingface.co/spaces/Tushar9802/sakhi). Same stack as a local install on a T4. ~5 min cold-boot wait after idle (Space runs on ephemeral disk). For instant evaluation, use the demo video or run locally below. **Pull the Unsloth fine-tune:** [`ollama pull tusharbrisingr9802/sakhi`](https://ollama.com/tusharbrisingr9802/sakhi). The LoRA-fine-tuned Gemma 4 E4B is on the Ollama registry. Run `python scripts/test_ollama_quality.py` against base + fine-tune to reproduce the 15/15 vs 14/15 A/B locally. **Health-center mode (workstation only):** ```bash pip install -r requirements-runtime.txt && ollama pull gemma4:e4b-it-q4_K_M cd frontend && npm install && npm run build && cd .. python api.py # browser: http://localhost:8000 ``` **Field mode (phone + Cactus):** > **Sakhi does not redistribute the Cactus-Compute model** — it is gated under a custom Cactus license. Reviewers verifying the Cactus track follow the documented path below. Most reviewers can verify the engineering claims via the workstation path above without ever installing on-device; the [3-minute demo video](https://youtu.be/n-u7J1lljUg) shows the full on-device flow on a real phone. ```bash # Build + install the APK once. After this the model install is in-app, no adb. cd frontend && npm run build && npx cap sync android && \ cd android && ./gradlew assembleDebug && \ adb install -r app/build/outputs/apk/debug/app-debug.apk # Model install — primary path, no developer tooling needed: # 1. Accept terms at huggingface.co/Cactus-Compute/gemma-4-E2B-it # 2. Download gemma-4-e2b-it-int4.zip (~4.4 GB) to the PHONE'S Downloads # folder (USB MTP from PC, OTG drive, or direct Drive download to local). # 3. Open Sakhi → Field Mode → On-Device Probe → Import model (.zip) # → pick the zip. Progress bar fills in ~3-5 min. # 4. Tap Load Model → Test Hindi. # # Re-imports auto-evict the previous model — one model on disk at a time. # Developer alternative (adb-based, no manual file picking): # export HF_TOKEN=hf_... && bash scripts/setup_cactus_model.sh ``` A sample Hindi transcript ready to paste is at `data/processed/train.jsonl` (line 1 = ANC preeclampsia case) or in the main README. ## Privacy & data handling Audio and transcripts never leave the institution that owns them. Workstation mode keeps everything on the PHC's local network (Whisper + Ollama on local GPU; no OpenAI / Anthropic / Google API). Field mode runs on-device via Cactus SDK — airplane mode does not break it. Patient demographics enter as a typed header rather than being extracted from audio, so identifiers are minimised at the boundary. This posture is compatible with India's Digital Personal Data Protection Act, 2023 — data fiduciary stays within the institution, no cross-border transfer, purpose limitation enforced by architecture rather than by policy. ## What's next with $10K and six more months - Partner with an ASHA training institute (Santosh Medical College / IIT Madras Bhashini) to collect 100+ hours of *real* ASHA home-visit audio under field conditions. Current evaluation covers 4 real-voice recordings (2 speakers — 1 female Bareilly reader + 1 male self-record — across 3 of 4 role-play scripts) plus the 15-case synthetic test suite; full-corpus rural-female accent + field-noise validation is the next step. - Fine-tune an IndicWhisper variant on that real audio for the on-device voice-in path not shipped here. - Harden integration with the official MCTS API so forms post directly into the NHM system instead of being exported as JSON/CSV. - Pilot with 10–20 ASHA workers in one block (Muradnagar / Loni-adjacent) with before/after time-and-accuracy measurement. ## Contact Tushar J — tusharbrisingr9802@gmail.com — GitHub: [Tushar-9802/Sakhi](https://github.com/Tushar-9802/Sakhi)