--- language: - en - ta license: mit tags: - automatic-speech-recognition - tamil - english - tanglish - medical-asr - clinical-dialogue - edge-ai - smartphone - polyglot-lion pipeline_tag: automatic-speech-recognition base_model: knoveleng/polyglot-lion-1.7b-v1.5 --- # OHM Tanglish MedASR 1.7B v152 **OHM Tanglish MedASR 1.7B v152 is the first checkpoint from our one-Indic-language + English medical ASR effort: a Tamil + English clinical speech transcription model at compact 1.7B scale, aimed at smartphone-class edge medical transcription.** This checkpoint is a research collaboration between [osmAPI](https://osmapi.com), [OHM - Open Holistic Medicine](https://ohm.doctor), and the [Terv Student Research Team](https://terv.pro). We use **Tanglish** here to mean the Tamil + English speech environment common in many clinical conversations: Tamil, English, transliterated terms, medical vocabulary, patient symptom descriptions, and doctor-patient dialogue. The model is designed as the first language-pair checkpoint in a repeatable path toward one-Indic-language + English medical ASR systems that can eventually run near the patient on edge devices. ## Base Model - **Base model:** [knoveleng/polyglot-lion-1.7b-v1.5](https://huggingface.co/knoveleng/polyglot-lion-1.7b-v1.5) - **Released checkpoint:** OHM Tanglish MedASR 1.7B v152 - **Parameter scale:** 1.7B - **Training method:** full-parameter finetune - **Adapter type:** none - **Primary target:** Tamil + English medical ASR - **Deployment direction:** smartphone-class edge transcription after quantization and runtime packaging ## Why This Size Medical ASR should not always require a cloud-scale model. Larger ASR systems can be stronger on broad benchmarks, but they are not the natural first target for phone-side or clinic-side transcription. We target 1.7B parameters because this size is a practical middle ground: - large enough to preserve multilingual ASR behavior - small enough to target 4-bit or 8-bit quantized inference - plausible for offline or low-connectivity medical transcription workflows - suitable for privacy-preserving edge research - repeatable for future one-Indic-language + English medical ASR checkpoints This repository contains the research checkpoint. Actual "runs on any smartphone" support requires mobile export, quantization, runtime integration, and device-level benchmarking. ## Finetune Data v152 was selected from a mixed Tamil + English + medical training recipe designed to preserve all three capabilities at once. | Training Source | Purpose | Rows | Hours | |---|---|---:|---:| | FLEURS English train wide | English speech coverage | 6,000 | 17.3986 | | IISc-MILE Tamil train wide | Tamil ASR robustness | 7,200 | 18.5397 | | FLEURS Tamil train | Tamil read-speech coverage | 1,800 | 7.2667 | | PriMock57 train | clinical dialogue speech | 1,200 | 0.8805 | | Medical Speech Intent train | medical symptom utterances | 120 | 0.1835 | | **Total** | | **16,320** | **44.2690** | Validation used held-out English, Tamil, medical-intent, and clinical-dialogue data: | Validation Source | Rows | Hours | |---|---:|---:| | FLEURS English validation | 60 | 0.1693 | | Medical Speech Intent validation | 120 | 0.1634 | | PriMock57 validation | 80 | 0.0864 | | FLEURS Tamil validation | 60 | 0.2128 | | **Total** | **320** | **0.6320** | All test sets were held out for evaluation only. ## Finetuning OHM Tanglish MedASR 1.7B v152 was trained as a full-parameter finetune from Polyglot-Lion 1.7B. We selected the checkpoint with a conservative guard-suite approach: - train on a Tamil + English + medical mixture - evaluate on medical symptom speech, clinical dialogue, English speech, Tamil FLEURS, and IISc-MILE Tamil - retain only checkpoints that improve without introducing regressions across the mixed guard suite v152 remained the local champion after additional dataset-focused passes using public Tamil and medical-dialogue data. Later candidates either tied v152 or regressed on at least one guard gate. ## Results Lower is better. | Evaluation Gate | Samples | Vanilla WER / CER | OHM Tanglish MedASR WER / CER | WER Delta | |---|---:|---:|---:|---:| | Medical Speech Intent | 500 | 4.57 / 1.47 | **4.55 / 1.42** | -0.02 | | PriMock57 full | 969 | 12.79 / 8.26 | **12.66 / 8.15** | -0.13 | | FLEURS English | 100 | 6.08 / 3.02 | **5.73 / 3.26** | -0.35 | | FLEURS Tamil | 100 | 33.07 / 12.09 | **31.70 / 11.56** | -1.37 | | IISc-MILE Tamil / SLR127 | 100 | 37.03 / 10.92 | **36.95 / 10.79** | -0.08 | Macro WER across the five gates: **18.32**. Sample-weighted WER across 1,769 evaluation samples: **12.43**. ## Claim OHM Tanglish MedASR 1.7B v152 is our first validated Tamil + English medical ASR checkpoint for the one-Indic-language + English medical ASR direction. We claim that this checkpoint: - improves WER over vanilla Polyglot-Lion on all five local Tamil + English + medical evaluation gates - is a compact 1.7B full-finetuned model selected for mixed clinical, English, and Tamil ASR - is sized for smartphone-class edge deployment research after quantization and runtime packaging - establishes the first checkpoint in a repeatable path toward other Indic-language + English medical ASR models We do not claim that this checkpoint is: - global Tamil ASR state of the art - a certified medical device - ready for unsupervised clinical documentation - universally runnable on every smartphone without further mobile packaging and validation - superior to Whisper, Qwen3-ASR, MERaLiON, IndicWhisper, Canary, or commercial medical ASR systems without apples-to-apples evaluation on the same manifests ## Intended Use This model is intended for: - Tamil + English medical ASR research - clinical speech transcription prototypes - patient symptom transcription experiments - medical dialogue ASR benchmarking - offline and edge ASR studies - smartphone-oriented ASR quantization and packaging experiments - future one-Indic-language + English medical ASR research ## Safety And Limitations This checkpoint is for research and prototyping. It is not a certified medical device and should not be used as the sole source for clinical care, diagnosis, treatment, billing, or legal documentation. Known limitations: - The benchmark suite is local and not directly comparable to every public ASR leaderboard. - Public Tamil-specialized systems may perform better on other Tamil benchmarks. - English performance improves over our local vanilla baseline but is not claimed as state of the art. - Medical-dialogue evaluation used public/simulated benchmark material, not private real-world deployment audio. - Mobile deployment still requires export, quantization, runtime integration, and device testing. - Human review is required for any clinical workflow. ## Collaboration OHM Tanglish MedASR 1.7B v152 is a research collaboration between: - [osmAPI](https://osmapi.com) - [OHM - Open Holistic Medicine](https://ohm.doctor) - [Terv Student Research Team](https://terv.pro) ## Acknowledgements Built from the Polyglot-Lion/Qwen3-ASR ecosystem and evaluated with public and local ASR research manifests. Please cite the upstream model, datasets, and this checkpoint when using it in research.