Model with QAT?

#30

by 4ntoine - opened 14 days ago

Discussion

4ntoine

LiteRT Community (FKA TFLite) org 14 days ago

https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/

How can we have it in litertlm format?

4ntoine

LiteRT Community (FKA TFLite) org 14 days ago

It's pointing to mobile models, not here: https://huggingface.co/collections/google/gemma-4-qat-mobile

developerabu

9 days ago

It's pointing to mobile models, not here: https://huggingface.co/collections/google/gemma-4-qat-mobile

can we convert it to litertlm?

marissaw

LiteRT Community (FKA TFLite) org 9 days ago

The .litertlm models on this card already use the QAT that is discussed in the blog post. The most popular file, gemma-4-E2B-it.litertlm, uses a mixture of int2, int4 and int8 to keep it small, fast and efficient.

4ntoine

LiteRT Community (FKA TFLite) org 8 days ago

@marissaw Does it mean gemma 4 - E2B without audio engine can be executed and consume less than 1gb?

developerabu

8 days ago

@marissaw Does it mean gemma 4 - E2B without audio engine can be executed and consume less than 1gb?

https://huggingface.co/developerabu/gemma-4-e2b-text-only-litertlm

I unpacked and repack only text weight

marissaw

LiteRT Community (FKA TFLite) org 6 days ago

Thank you @developerabu !

@marissaw Does it mean gemma 4 - E2B without audio engine can be executed and consume less than 1gb?

The audio, vision and drafter models should all be loaded on-demand. When running the benchmarks for this model card, we ran in a text-only mode so none of the optional models should have been loaded. This means that the memory numbers in the model card should reflect what you are asking for.

The answer is, it depends on if you are running on CPU or GPU, which operating system you are using, which CPU/GPU vendor(s) your device has and how you define memory usage. For example, the model running on S26 Ultra on GPU only uses 676 MB of rusage::ru_maxrss. However, there are other device set ups and definitions of memory usage that could cause the memory to be higher than 1 GB. I'd recommend looking at the model card for more information.

marissaw

LiteRT Community (FKA TFLite) org 6 days ago

Also, the memory usage depends on how long of a context length you would like to use.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment