Add root config.json (Hub download-stats query file + framework pointer) 81e1928 verified mlboydaisuke commited on 13 days ago
head-quant docs: per-block-32 absmax ship shape (per-channel = beta delegate bug; naming note) 4447ec6 verified mlboydaisuke commited on 14 days ago
card: gpu-pipelined int8lin bundle (iPhone 50.3-51.5 / Mac 204 tok/s) + run instructions b647bda verified mlboydaisuke commited on 15 days ago
qwen3.5-0.8B int8lin decode-only loop-free bundle (pipelined engine): Mac 204 tok/s, iPhone 50.3-51.5 tok/s 0019cb6 verified mlboydaisuke commited on 15 days ago
int8 fused-kernel monolith (42.5-45.4 tok/s) + q16 chunked-prefill companion (147 tok/s) — new release config c2a8a57 verified mlboydaisuke commited on 15 days ago
ios-gpu: add qwen3_5_0_8b_ios_hc_prefill_q16_b2048_int8.aimodel 23e3da6 verified mlboydaisuke commited on 15 days ago
ios-gpu: add qwen3_5_0_8b_ios_hc0_int8v3.aimodel b0080fd verified mlboydaisuke commited on 15 days ago
Remove pre-category-layout path ios-gpu-static/ 161130e verified mlboydaisuke commited on 15 days ago
macOS GPU best: dynamic int8 (58.5 tok/s release) cc382cb verified mlboydaisuke commited on 15 days ago
iOS GPU best: fp16 static ctx-2048 monolith (27.7 tok/s) 2d7f93a verified mlboydaisuke commited on 15 days ago
Card: category layout (best verified config per platform x compute-unit) 7f8783c verified mlboydaisuke commited on 15 days ago
static ctx-2048 monolith (iPhone GPU 27.7 tok/s, release config) a94f01c verified mlboydaisuke commited on 15 days ago
dynamic int8 bundle (iPhone GPU 12.5 / ANE 14.7 / Mac 58.5 tok/s) 48e0a23 verified mlboydaisuke commited on 15 days ago