Benchmarked on 40-phone farm — impressive mobile results

#184
by 3morixd - opened

We deployed Llama-3.2-1B-Instruct (Q4_K_M GGUF) across 40 Samsung Galaxy S20 FE 5G devices (Snapdragon 865, 8GB RAM, Android 13) using llama.cpp.

Results:

  • Average generation: 16.3 tokens/sec per device
  • Average prompt processing: 57.8 tokens/sec
  • RAM usage: ~3.5GB free after model load
  • Battery impact: minimal (phones stayed at 88-97% during sustained inference)
  • Thermal: 28-36°C across the farm

This model is the sweet spot for on-device AI — small enough to run on any modern phone, capable enough for real tasks. We've quantized and repackaged it here: dispatchAI/Llama-3.2-1B-Instruct-mobile

Built by Dispatch AI (FZE) — SRTI Free Zone, Sharjah, UAE.

Sign up or log in to comment