mjbommar
/

mimelens-001-medium-bpe-16k-s1

Text Classification

file-type-detection

mime-classification

binary-analysis

position-agnostic

packet-inspection

byte-pair-encoding

Eval Results (legacy)

Model card Files Files and versions

mjbommar commited on 28 days ago

Commit

91b72cd

·

verified ·

1 Parent(s): fcee90a

README: surface ONNX bundle for users

Files changed (2) hide show

README.md +3 -0
onnx/README.md +4 -4

README.md CHANGED Viewed

@@ -70,6 +70,9 @@ The family ships 28 parent cells (3 sizes × 4 vocabs × 2-3 seeds at seq\_len=1
 > **Short-sequence sibling available.** If your inputs are sub-KB (DNS payloads, sub-MTU packets, small forensic fragments), use `mjbommar/mimelens-001-medium-bpe-16k-s1-seq256` instead. Same architecture, 4× shorter context, ~5× lower CPU latency, BPE-cell accuracy ties or beats this cell on the magic-files probe-fit. See paper Appendix B.5.
 ---
 ## Overview

 > **Short-sequence sibling available.** If your inputs are sub-KB (DNS payloads, sub-MTU packets, small forensic fragments), use `mjbommar/mimelens-001-medium-bpe-16k-s1-seq256` instead. Same architecture, 4× shorter context, ~5× lower CPU latency, BPE-cell accuracy ties or beats this cell on the magic-files probe-fit. See paper Appendix B.5.
+> **ONNX bundled.** This cell ships `onnx/model_fp32.onnx` + `onnx/model_int8.onnx` (dynamic int8 of MatMul/Gemm) for direct ONNX Runtime inference. See `onnx/README.md` in this repo for input/output shapes and the latency profile.
 ---
 ## Overview

onnx/README.md CHANGED Viewed

@@ -1,8 +1,8 @@
-# ONNX exports for MimeLens-medium-bpe-16k-s1
 Two ONNX exports are bundled here:
-- `model_fp32.onnx` + `model_fp32.onnx.data` — float32 export via the legacy torch.onnx exporter; ~185 MB total. Load with `onnxruntime.InferenceSession`.
-- `model_int8.onnx` — dynamic int8 quantization via `onnxruntime.quantization.quantize_dynamic`; ~47 MB. Dynamic int8 was slower than fp32 on the CPU we measured (no AVX-VNNI); static (calibrated) quantization on modern int8-GEMM hardware should narrow the gap. See `data/p1/cpu_latency.json` in the GitHub repo for measured single-sample latencies.
-The inputs are `(input_ids: int64 [B, 1024], attention_mask: int64 [B, 1024])` and the output is `mean_pool_embedding: float32 [B, 512]`.

+# ONNX exports for MimeLens-medium-bpe-16k-s1 (seq_len=1024)
 Two ONNX exports are bundled here:
+- `model_fp32.onnx` (+ `model_fp32.onnx.data` if exported with external tensors) via the legacy torch.onnx exporter. Load with `onnxruntime.InferenceSession`.
+- `model_int8.onnx` via `onnxruntime.quantization.quantize_dynamic`; dynamic int8 is slower than fp32 on this CPU (no AVX-VNNI; fp32 392 ms / int8 547 ms p50). Static (calibrated) quantization on modern int8-GEMM hardware should narrow the gap further.
+Input shapes are `(input_ids: int64 [B, 1024], attention_mask: int64 [B, 1024])` and the output is `mean_pool_embedding: float32 [B, 512]`.