Farhang87 commited on
Commit
c783336
·
0 Parent(s):

Reset repository history to current release state

Browse files
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ omi-med-stt-v1-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
37
+ omi-med-stt-v1-f16.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ library_name: gguf
6
+ tags:
7
+ - automatic-speech-recognition
8
+ - medical
9
+ - parakeet
10
+ - gguf
11
+ - parakeet.cpp
12
+ - omi-med-stt
13
+ pipeline_tag: automatic-speech-recognition
14
+ base_model: nvidia/parakeet-tdt-0.6b-v2
15
+ ---
16
+
17
+ # Omi Med STT v1 GGUF
18
+
19
+ GGUF export of [Omi Med STT v1](https://huggingface.co/omi-health/omi-med-stt-v1)
20
+ for Linux and Windows CPU use through the `omi-med-stt` CLI.
21
+
22
+ This is the portability path. If you have Apple Silicon, use the MLX q8 repo. If
23
+ you have an NVIDIA GPU, use the canonical NeMo checkpoint.
24
+
25
+ ## Quickstart
26
+
27
+ ```bash
28
+ pip install -U omi-med-stt
29
+ omi-med-stt install-cpp --cpp-backend cpu
30
+ omi-med-stt audio.wav --runtime cpp
31
+ ```
32
+
33
+ ## Files
34
+
35
+ | File | Status |
36
+ |---|---|
37
+ | `omi-med-stt-v1-q8_0.gguf` | Default CPU artifact, benchmarked |
38
+ | `omi-med-stt-v1-f16.gguf` | Provided for conversion/experimentation; not independently benchmarked |
39
+
40
+ ## Evaluation
41
+
42
+ Full evaluation details: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/).
43
+ Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.
44
+
45
+ ### NeMo vs Open / Local Models
46
+
47
+ Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.
48
+
49
+ | Model | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
50
+ |---|---:|---:|---:|---:|---:|
51
+ | VibeVoice-ASR 9B | 11.10% | 1.78% | 1.36% | 98.71% | 5m 20s (11.2x) |
52
+ | **Omi Med STT v1 NeMo** | **8.30%** | **2.37%** | **4.75%** | **97.95%** | **25s (146.3x)** |
53
+ | Qwen3 ASR 1.7B | 10.72% | 3.13% | 6.11% | 97.21% | 44s (81.1x) |
54
+ | Whisper Large v3 Turbo (A10) | 11.98% | 3.93% | 5.88% | 96.45% | 1m 19s (45.8x) |
55
+ | Cohere Transcribe 03-2026 | 14.88% | 5.05% | 11.09% | 95.16% | 25s (146.3x) |
56
+ | Parakeet TDT 0.6B v3 | 15.26% | 8.01% | 9.50% | 96.34% | 23s (157.9x) |
57
+ | Parakeet TDT 0.6B v2 base | 16.45% | 8.36% | 8.60% | 96.20% | 23s (153.8x) |
58
+
59
+ ### Runtime Artifacts
60
+
61
+ Same internal evaluation as the canonical checkpoint.
62
+
63
+ | Artifact | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
64
+ |---|---:|---:|---:|---:|---:|
65
+ | NeMo canonical | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) |
66
+ | MLX q8 | 8.61% | 2.75% | 5.20% | 97.63% | 53s (67.4x) |
67
+ | **GGUF q8_0** | **9.12%** | **3.20%** | **6.33%** | **97.53%** | **2m 53s (20.8x)** |
68
+
69
+ The GGUF q8_0 build is useful when CPU portability matters. It is not the
70
+ quality-leading artifact.
71
+
72
+ ## Compatibility
73
+
74
+ These files are **not llama.cpp text-model GGUF files**. They require a Parakeet
75
+ ASR runtime. The supported path is:
76
+
77
+ ```bash
78
+ omi-med-stt audio.wav --runtime cpp
79
+ ```
80
+
81
+ The CLI installs the patched `parakeet.cpp` runtime needed for Omi Med STT v1.
82
+
83
+ ## Links
84
+
85
+ - Canonical model: [`omi-health/omi-med-stt-v1`](https://huggingface.co/omi-health/omi-med-stt-v1)
86
+ - Mac q8 default: [`omi-health/omi-med-stt-v1-mlx-q8`](https://huggingface.co/omi-health/omi-med-stt-v1-mlx-q8)
87
+ - Runtime CLI: [`Omi-Health/omi-med-stt-runtime`](https://github.com/Omi-Health/omi-med-stt-runtime)
88
+ - Broader evaluation and product context: [omi.health/research/omi-med-stt](https://omi.health/research/omi-med-stt/)
89
+ - parakeet.cpp: [`mudler/parakeet.cpp`](https://github.com/mudler/parakeet.cpp)
90
+
91
+ ## Safety
92
+
93
+ Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage,
94
+ prescribing, or clinical decision model, and it is not clinically validated.
95
+ Transcripts must be reviewed before any clinical use.
SHA256SUMS ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ 813fb59fdaae8784203685a7607d29f6f12202d7606ef49f5932f72a0ee04f86 omi-med-stt-v1-f16.gguf
2
+ c4f364a730df7aa9bb0714cda1b1ad5e3104331db9919bb0e2a379d0fb64dbab omi-med-stt-v1-q8_0.gguf
omi-med-stt-v1-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:813fb59fdaae8784203685a7607d29f6f12202d7606ef49f5932f72a0ee04f86
3
+ size 1429588608
omi-med-stt-v1-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4f364a730df7aa9bb0714cda1b1ad5e3104331db9919bb0e2a379d0fb64dbab
3
+ size 929205888
parakeet-cpp-omi-adapter.patch ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diff --git a/scripts/convert_parakeet_to_gguf.py b/scripts/convert_parakeet_to_gguf.py
2
+ index 9e2462d..a41f368 100644
3
+ --- a/scripts/convert_parakeet_to_gguf.py
4
+ +++ b/scripts/convert_parakeet_to_gguf.py
5
+ @@ -2,10 +2,12 @@
6
+ """Convert a NeMo Parakeet checkpoint to GGUF (f32 / f16 / q8_0).
7
+
8
+ The GGUF is fully metadata-driven: all config lives in KV, and tensor names are
9
+ -kept **verbatim** from the NeMo ``state_dict`` (no renaming) so the C++ port is a
10
+ -1:1 mapping. The two featurizer buffers (``preprocessor.featurizer.fb`` and
11
+ -``preprocessor.featurizer.window``) are lifted directly from the checkpoint so the
12
+ -C++ side never re-derives the mel filterbank with librosa.
13
+ +kept **verbatim** from the NeMo ``state_dict`` for upstream Parakeet checkpoints.
14
+ +Omi Med STT adapter tensors are the one exception: they are written with compact
15
+ +``omi_adapter`` names because the C GGUF reader rejects tensor names >=64 bytes.
16
+ +The two featurizer buffers (``preprocessor.featurizer.fb`` and
17
+ +``preprocessor.featurizer.window``) are lifted directly from the checkpoint so
18
+ +the C++ side never re-derives the mel filterbank with librosa.
19
+
20
+ Quantization (``--dtype f16|q8_0``) is applied **only** to the large linear
21
+ weights that the C++ engine consumes directly via ``ggml_mul_mat`` (the encoder
22
+ @@ -120,6 +122,47 @@ def should_quantize(name, shape, dtype):
23
+ return None
24
+
25
+
26
+ +_OMI_ADAPTER_RE = re.compile(
27
+ + r"^encoder\.layers\.(\d+)\.adapter_layer\.medical_v1d_rank128"
28
+ + r"\.module\.(0|1|3)\.(weight|bias)$"
29
+ +)
30
+ +
31
+ +
32
+ +def compact_omi_adapter_name(name):
33
+ + """Return the GGUF tensor name for Omi's post-Conformer adapter.
34
+ +
35
+ + NeMo stores names like
36
+ + ``encoder.layers.0.adapter_layer.medical_v1d_rank128.module.0.weight``.
37
+ + Those exceed the C GGUF tensor-name limit used by parakeet.cpp, so Omi's
38
+ + adapter extension writes them as compact names and the C++ runtime looks up
39
+ + this compact schema.
40
+ + """
41
+ + m = _OMI_ADAPTER_RE.match(name)
42
+ + if not m:
43
+ + return name
44
+ + layer, module, suffix = m.groups()
45
+ + if module == "0":
46
+ + part = f"norm.{suffix}"
47
+ + elif module == "1" and suffix == "weight":
48
+ + part = "down.weight"
49
+ + elif module == "3" and suffix == "weight":
50
+ + part = "up.weight"
51
+ + else:
52
+ + return name
53
+ + return f"encoder.layers.{layer}.omi_adapter.{part}"
54
+ +
55
+ +
56
+ +def detect_omi_adapter(sd):
57
+ + down_key = "encoder.layers.0.adapter_layer.medical_v1d_rank128.module.1.weight"
58
+ + if not any(".adapter_layer.medical_v1d_rank128." in name for name in sd):
59
+ + return False, 0
60
+ + rank = 0
61
+ + t = sd.get(down_key)
62
+ + if t is not None and hasattr(t, "shape") and len(t.shape) >= 1:
63
+ + rank = int(t.shape[0])
64
+ + return True, rank
65
+ +
66
+ +
67
+ def main():
68
+ ap = argparse.ArgumentParser()
69
+ ap.add_argument("--model", required=True, help="HF id or local .nemo")
70
+ @@ -147,6 +190,7 @@ def main():
71
+ cfg = m.cfg
72
+ enc = cfg.encoder
73
+ feat = m.preprocessor.featurizer # effective runtime values live here
74
+ + sd = m.state_dict()
75
+
76
+ w = gguf.GGUFWriter(args.output, "parakeet")
77
+ w.add_string("general.name", args.model)
78
+ @@ -170,6 +214,19 @@ def main():
79
+ w.add_uint32("parakeet.encoder.pos_emb_max_len",
80
+ int(_get(enc, "pos_emb_max_len", 5000)))
81
+
82
+ + # Optional Omi Med STT post-Conformer adapter. This is absent from NVIDIA
83
+ + # Parakeet checkpoints and present in Omi's H4/v1 checkpoint. We detect it
84
+ + # from the state_dict because NeMo adapter metadata is not guaranteed to
85
+ + # expose a simple encoder.medical_adapter_rank config value after restore.
86
+ + adapter_rank = int(_get(enc, "medical_adapter_rank", 0) or 0)
87
+ + has_omi_adapter, inferred_adapter_rank = detect_omi_adapter(sd)
88
+ + if adapter_rank <= 0:
89
+ + adapter_rank = inferred_adapter_rank
90
+ + if has_omi_adapter and adapter_rank > 0:
91
+ + w.add_bool("parakeet.omi_med_adapter.enabled", True)
92
+ + w.add_uint32("parakeet.omi_med_adapter.rank", adapter_rank)
93
+ + w.add_string("parakeet.omi_med_adapter.name", "omi_adapter")
94
+ +
95
+ # --- Cache-aware streaming / causal config (Phase 5) ---------------------
96
+ # These KVs describe the chunked-limited attention + causal conv that the
97
+ # streaming FastConformer (e.g. parakeet_realtime_eou_120m-v1) uses. They are
98
+ @@ -270,10 +327,10 @@ def main():
99
+ )
100
+ w.add_array("parakeet.tdt.durations", [int(d) for d in durs])
101
+
102
+ - # tensors: verbatim names. Allowlisted linear weights are quantized per
103
+ - # --dtype (ggml dequantizes them on the fly inside ggml_mul_mat); everything
104
+ - # else stays f32. Include featurizer buffers explicitly.
105
+ - sd = m.state_dict()
106
+ + # tensors: verbatim names except Omi adapter compact aliases. Allowlisted
107
+ + # linear weights are quantized per --dtype (ggml dequantizes them on the fly
108
+ + # inside ggml_mul_mat); everything else stays f32. Include featurizer buffers
109
+ + # explicitly.
110
+ written = 0
111
+ quantized = 0
112
+ keep_buffers = {"preprocessor.featurizer.fb", "preprocessor.featurizer.window"}
113
+ @@ -289,14 +346,15 @@ def main():
114
+ # ggml ne is the reverse of the numpy/torch shape; ne[0] is the leading
115
+ # (contraction) axis q8_0 blocks along.
116
+ ggml_ne = list(arr.shape[::-1])
117
+ + out_name = compact_omi_adapter_name(name)
118
+ qtype = should_quantize(name, ggml_ne, args.dtype)
119
+ if qtype is None:
120
+ - w.add_tensor(name, arr)
121
+ + w.add_tensor(out_name, arr)
122
+ else:
123
+ raw = gguf.quantize(arr, qtype)
124
+ # gguf expects raw_shape to be the *byte* shape of the quantized
125
+ # buffer; it derives the element shape from it via raw_dtype.
126
+ - w.add_tensor(name, raw, raw_shape=raw.shape, raw_dtype=qtype)
127
+ + w.add_tensor(out_name, raw, raw_shape=raw.shape, raw_dtype=qtype)
128
+ quantized += 1
129
+ written += 1
130
+
131
+ diff --git a/src/conformer.cpp b/src/conformer.cpp
132
+ index 8ef6645..8e80a15 100644
133
+ --- a/src/conformer.cpp
134
+ +++ b/src/conformer.cpp
135
+ @@ -276,6 +276,8 @@ ConformerLayer::ConformerLayer(const ModelLoader& ml, int layer_idx)
136
+ conv_kernel_ = (int)ml.config().conv_kernel;
137
+ conv_norm_type_ = ml.config().conv_norm_type;
138
+ conv_causal_ = ml.config().conv_causal;
139
+ + omi_med_adapter_ = ml.config().omi_med_adapter;
140
+ + omi_med_adapter_name_ = ml.config().omi_med_adapter_name;
141
+ assert((conv_norm_type_ == "batch_norm" || conv_norm_type_ == "layer_norm") &&
142
+ "ConformerLayer supports conv_norm_type in {batch_norm, layer_norm}");
143
+ assert(n_heads_ > 0 && d_model_ % n_heads_ == 0);
144
+ @@ -322,6 +324,21 @@ ggml_tensor* ConformerLayer::build_graph_batched(ggml_context* ctx,
145
+ h = linear(h, ff + ".linear2", /*bias*/true); // [D, T, B]
146
+ return h;
147
+ };
148
+ + auto omi_med_adapter = [&](ggml_tensor* in) {
149
+ + if (!omi_med_adapter_) return in;
150
+ + const std::string ap = pre + "omi_adapter.";
151
+ + ggml_tensor* g = clone_weight(ctx, ml, ap + "norm.weight");
152
+ + ggml_tensor* b = clone_weight(ctx, ml, ap + "norm.bias");
153
+ + ggml_tensor* w_down = clone_weight(ctx, ml, ap + "down.weight");
154
+ + ggml_tensor* w_up = clone_weight(ctx, ml, ap + "up.weight");
155
+ + ggml_tensor* y = ggml_norm(ctx, in, ln_eps);
156
+ + y = ggml_mul(ctx, y, g);
157
+ + y = ggml_add(ctx, y, b);
158
+ + y = ggml_mul_mat(ctx, w_down, y);
159
+ + y = ggml_silu(ctx, y);
160
+ + y = ggml_mul_mat(ctx, w_up, y);
161
+ + return ggml_add(ctx, in, y);
162
+ + };
163
+
164
+ // === Stage A: r = x + 0.5 * FFN1(norm_ff1(x)). ===
165
+ ggml_tensor* h1 = layer_norm(xt, "norm_feed_forward1");
166
+ @@ -349,6 +366,7 @@ ggml_tensor* ConformerLayer::build_graph_batched(ggml_context* ctx,
167
+ h2 = ggml_scale(ctx, h2, 0.5f);
168
+ r = ggml_add(ctx, r, h2);
169
+ r = layer_norm(r, "norm_out");
170
+ + r = omi_med_adapter(r);
171
+ return r; // [D, T, B] -> per item row-major [T, D]
172
+ }
173
+
174
+ @@ -394,6 +412,21 @@ ggml_tensor* ConformerLayer::build_graph(ggml_context* ctx, ggml_tensor* xt,
175
+ h = linear(h, ff + ".linear2", /*bias*/true); // [D, T]
176
+ return h;
177
+ };
178
+ + auto omi_med_adapter = [&](ggml_tensor* in) {
179
+ + if (!omi_med_adapter_) return in;
180
+ + const std::string ap = pre + "omi_adapter.";
181
+ + ggml_tensor* g = clone_weight(ctx, ml, ap + "norm.weight");
182
+ + ggml_tensor* b = clone_weight(ctx, ml, ap + "norm.bias");
183
+ + ggml_tensor* w_down = clone_weight(ctx, ml, ap + "down.weight");
184
+ + ggml_tensor* w_up = clone_weight(ctx, ml, ap + "up.weight");
185
+ + ggml_tensor* y = ggml_norm(ctx, in, ln_eps);
186
+ + y = ggml_mul(ctx, y, g);
187
+ + y = ggml_add(ctx, y, b);
188
+ + y = ggml_mul_mat(ctx, w_down, y);
189
+ + y = ggml_silu(ctx, y);
190
+ + y = ggml_mul_mat(ctx, w_up, y);
191
+ + return ggml_add(ctx, in, y);
192
+ + };
193
+
194
+ // === Stage A: r = x + 0.5 * FFN1(norm_ff1(x)). ===
195
+ ggml_tensor* h1 = layer_norm(xt, "norm_feed_forward1");
196
+ @@ -420,6 +453,7 @@ ggml_tensor* ConformerLayer::build_graph(ggml_context* ctx, ggml_tensor* xt,
197
+ h2 = ggml_scale(ctx, h2, 0.5f);
198
+ r = ggml_add(ctx, r, h2);
199
+ r = layer_norm(r, "norm_out");
200
+ + r = omi_med_adapter(r);
201
+ return r; // [D, T] -> row-major [T, D]
202
+ }
203
+
204
+ diff --git a/src/conformer.hpp b/src/conformer.hpp
205
+ index 23402fb..d6a07f6 100644
206
+ --- a/src/conformer.hpp
207
+ +++ b/src/conformer.hpp
208
+ @@ -99,6 +99,8 @@ private:
209
+ int conv_kernel_;
210
+ std::string conv_norm_type_; // "batch_norm" (offline) or "layer_norm" (streaming)
211
+ bool conv_causal_ = false; // causal depthwise conv pad (left k-1, right 0)
212
+ + bool omi_med_adapter_ = false;
213
+ + std::string omi_med_adapter_name_;
214
+ };
215
+
216
+ } // namespace pk
217
+ diff --git a/src/model_loader.cpp b/src/model_loader.cpp
218
+ index 218dc91..3ad5df5 100644
219
+ --- a/src/model_loader.cpp
220
+ +++ b/src/model_loader.cpp
221
+ @@ -112,6 +112,10 @@ bool ModelLoader::load(const std::string& path){
222
+ cfg_.subsampling_conv_channels = kv_u32(gguf_, "parakeet.encoder.subsampling_conv_channels");
223
+ cfg_.xscaling = kv_bool(gguf_, "parakeet.encoder.xscaling", true);
224
+ cfg_.pos_emb_max_len = kv_u32(gguf_, "parakeet.encoder.pos_emb_max_len", 5000);
225
+ + cfg_.omi_med_adapter = kv_bool(gguf_, "parakeet.omi_med_adapter.enabled", false);
226
+ + cfg_.omi_med_adapter_rank = kv_u32(gguf_, "parakeet.omi_med_adapter.rank", 0);
227
+ + cfg_.omi_med_adapter_name = kv_str(
228
+ + gguf_, "parakeet.omi_med_adapter.name", "medical_v1d_rank128");
229
+ // cache-aware streaming / causal config (Phase 5). Absent for offline models
230
+ // -> offline-safe defaults (regular style, no causal, streaming.present=false).
231
+ cfg_.att_context_left = kv_i32(gguf_, "parakeet.encoder.att_context_left", -1);
232
+ diff --git a/src/model_loader.hpp b/src/model_loader.hpp
233
+ index 9947bd1..87be483 100644
234
+ --- a/src/model_loader.hpp
235
+ +++ b/src/model_loader.hpp
236
+ @@ -31,6 +31,12 @@ struct ParakeetConfig {
237
+ std::string conv_norm_type;
238
+ uint32_t subsampling_factor=0, subsampling_conv_channels=0, pos_emb_max_len=5000;
239
+ bool xscaling=true;
240
+ + // Optional Omi Med STT post-Conformer adapter. Absent/false for upstream
241
+ + // Parakeet checkpoints; enabled only when the GGUF declares it and carries
242
+ + // the adapter tensors.
243
+ + bool omi_med_adapter=false;
244
+ + uint32_t omi_med_adapter_rank=0;
245
+ + std::string omi_med_adapter_name;
246
+ // cache-aware streaming / causal config (Phase 5; offline-safe defaults)
247
+ int32_t att_context_left=-1, att_context_right=-1; // [-1,-1] = full context
248
+ std::string att_context_style="regular"; // or "chunked_limited"