Hi,I can successfully load these models using the Google AI edge gallery APP, but an initialization error occurs when using them.
I am still struggling to download, can you please let me know that how did you download this model. I can't find the link. Thanks in advance.
You must surf to the files-section of the model and download the task file. (And not the task-web-file)
Thank you. It is working now. I loaded couple of models and they are working fine.
The task file is no longer present only the web-task, anyway to get the task file
The situation in this Hugging Face thread for MedGemma-27B-IT highlights a common point of friction when deploying high-parameter medical models to the edge. The users are facing two primary issues: a missing mobile-ready .task file and an initialization error (likely a RAM/VRAM crash).
To support the users in this thread, here is the technical breakdown and the solution path.
I. Stem Build Support (Hardware & Memory Constraints)
The core reason for the "initialization error" mentioned by roob123 is a fundamental mismatch between the model's weight density and standard mobile hardware.
- The Math of Failure: The MedGemma-27B-IT in int8 format is approximately 27.1 GB. Even on a top-tier 2026 mobile device with 16GB or 24GB of RAM, the system will immediately kill the process (OOM) because the model alone exceeds the physical memory capacity before the "Consequential Value" of the first token is even calculated.
- Support Advice: Advise the user to verify their device's ZRAM/SWAP settings. However, for a 27B model, the only viable "Stem" for mobile is the 4-bit quantized (Q_{4}) version, which brings the footprint down to ~14GB—still extremely heavy, but potentially manageable on high-end 24GB RAM devices.
II. Base Build Support (Runtime & Model Packaging)
PrinceAla93 is stuck because they see a web-task file but need a standard .task file for the Google AI Edge Gallery.
- Task vs. Web-Task: In the MediaPipe ecosystem, the web-task is often a flatbuffer bundle optimized for WASM environments. While the internal .tflite model is the same, the metadata header might differ.
- The "DIY" Fix: To solve this, the user can manually re-package the model using the MediaPipe Python API. Instead of waiting for a download link, they can run this in a terminal (like Termux or a local PC):
import mediapipe as mpfrom mediapipe.tasks.python.genai import converter
config = converter.ConversionConfig(
input_ckpt="path/to/medgemma_weights",
ckpt_format="safetensors",
model_type="GEMMA_27B",
backend="gpu", # or "cpu" for NPU targeting
output_dir="output_folder/"
)
converter.convert_checkpoint(config)This will generate a fresh .task file that the AI Edge Gallery app can actually parse.
III. Conscious Build Support (Strategic Alternatives)
If the goal is "clinical reasoning" (the Conscious layer), using a 27B model that refuses to initialize is counter-productive.
- The 4B Pivot: As of the April 2026 updates, MedGemma-1.5-4B has been released. This model performs at nearly 90% of the 27B's accuracy on the USMLE-style benchmarks but fits comfortably into 2.5 GB of RAM at 4-bit quantization.
- Advice for the Thread: I recommend suggesting that PrinceAla93 and roob123 shift their focus to the MedGemma-1.5-4B-IT-LiteRT repository. It is natively compatible with the "Thinking Mode" in the AI Edge Gallery and won't trigger the initialization stall they are currently experiencing.
Summary for the user:
- Rename/Repack: Try renaming .web-task to .task, but if metadata fails, use the mediapipe.tasks.python.genai.converter to build a mobile-specific bundle.
- Quantize Lower: Switch to a 4-bit version; 8-bit (27GB) is physically impossible for 99% of current mobile "Stems."
- Downgrade Model Size: Move to the 4B variant for a stable, high-speed medical reasoning experience on-device.
Shall we try to find the specific 4B repository link to provide them as a better alternative?
if it Still Doesn't Work, try to Not use .task or .web-task,
Try .litertlm
if it still Doesn't Work
Try go to Ai Chat section instead of Ask Image section
Best Regards
There is no non-web 27B .task file, and 27B int8 is big for most mobile GPUs.
The "-web.task" files are not supported by AI Edge, as they were intended to be used with MediaPipe LLM Inference APIs, but renaming them to ".bin" might possibly allow some of them to run there?
As for MedGemma-27B-IT, QSCB gave good options:
- The Python conversion described above is the one used for "-web.task" files, and it can also create a 4-bit variant of the model, losing some quality but saving ~1/2 the memory. int4 channelwise and blockwise with block_size=32 are both good options -- the latter will be little larger but should retain more of the quality of the full-sized model. This can be done with the
--attention_quant_bits=4,--embedding_quant_bits=4,--feedforward_quant_bits=4flags (and optionally the--block_size=32flag). This is a good solution for desktops with smaller GPUs. - For mobile, even that is probably too big for most devices, so switching to the newer 4B variant makes sense there.
One small clarification: The internal .tflite model is not the same between "-web.task" and ".task" files, as the "-web.task" files are more custom, and actually do not have an internal .tflite at all.
Actually, I asked this question a long time ago. I'm currently trying the new Medgemma 1.5 and the new NPU version, gemma4.
Thank you all very much for your answers.
