mszymanska commited on
Commit
e296167
·
verified ·
1 Parent(s): ef53aee

docs: add Validation section with pipeline test data (M3 Ultra, 2026-04-22)

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -84,11 +84,24 @@ This card reports metadata present in the Hugging Face repository, existing fron
84
 
85
  Use the library instructions above, or run this checkpoint through the tested local serving path: [`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  ## Limitations
88
 
89
- - No public benchmark results are declared in this card.
90
  - Validate outputs on your own domain data before relying on this checkpoint.
91
  - Memory use and speed depend heavily on Apple Silicon generation, unified-memory size, prompt length, and runtime configuration.
 
92
 
93
  ## License
94
 
 
84
 
85
  Use the library instructions above, or run this checkpoint through the tested local serving path: [`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
86
 
87
+ ## Validation
88
+
89
+ End-to-end pipeline test 2026-04-22 on M3 Ultra (load → text → vision → unload), served via `mlx-batch-server`:
90
+
91
+ | Probe | TTFT | Output chars | Notes |
92
+ |---|---|---|---|
93
+ | Cold load | — | — | **21 s** from cold to ready |
94
+ | Text — simple greeting (PL) | 0.51 s | 601 | Clean output, abliterated behaviour |
95
+ | Text — canonical (PL, literary) | 0.29 s | 718 | Concise reasoning trace |
96
+ | Vision — JPEG (Monument Valley) | 6.50 s | 873 | Accurate scene description |
97
+
98
+ 3/3 probes passed. `has_reasoning=True` on all probes — this model emits reasoning traces via `<think>` markers.
99
+
100
  ## Limitations
101
 
 
102
  - Validate outputs on your own domain data before relying on this checkpoint.
103
  - Memory use and speed depend heavily on Apple Silicon generation, unified-memory size, prompt length, and runtime configuration.
104
+ - Validation data above reflects M3 Ultra; expect different timings on other hardware.
105
 
106
  ## License
107