Text Generation
Transformers
Safetensors
English
gemma4
image-text-to-text
cerebras
expert-pruning
gemma
Mixture of Experts
pruning
reap
conversational
Instructions to use 0xSero/Gemma-4-19B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 0xSero/Gemma-4-19B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="0xSero/Gemma-4-19B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("0xSero/Gemma-4-19B") model = AutoModelForMultimodalLM.from_pretrained("0xSero/Gemma-4-19B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use 0xSero/Gemma-4-19B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "0xSero/Gemma-4-19B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Gemma-4-19B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/0xSero/Gemma-4-19B
- SGLang
How to use 0xSero/Gemma-4-19B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "0xSero/Gemma-4-19B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Gemma-4-19B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "0xSero/Gemma-4-19B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0xSero/Gemma-4-19B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use 0xSero/Gemma-4-19B with Docker Model Runner:
docker model run hf.co/0xSero/Gemma-4-19B
Korean Multilingual is broken.
#6
by DFveloper - opened
The Korean output is quite broken.
it mixes with 5 different languages.
input is nice. and latent reasoning is also nice.
I'll provide the translation.
Original
User: 한국어로 말해라.
LLM:
서로를로(Sử-ro-lo)로,我是由 Google 提供的 AI 模型.
我可以被看作是一个智能助手(Intelligent Assistant),能够理解并处理文本、代码,并回答各种问题。
如果您希望我用韩语(Korean)进行对话,我会根据您的要求切换。
**현재モード (Current Mode):** 我正在使用中文回答,但如果您需要我用韩语对话,请告诉我。
(Nếuคุณ muốn我用韩语回答,请写 "请用韩语回答" 或 "请用韩语继续".)
Translation
User: Say in korean.
LLM:
As Seororel(Sử-ro-lo), I'm Intelligent Assistant created by Google.
Do you know I'm smart assistant that understands text and code? If you wish to conversate "korean", I'll change mode with your favor.
**Current Mode:** now I'm talking chinese, Tell me if you wish to conversate korean.
(If you wish me to conversate with korean, say "please say with korean" or "please continue with korean" in chinese.)
the sentence is right, but language filter layers are gone.
also, model's attitude is Driven to Hell🤣
PS: Don't worry. it could be fixed with SFT.
REAP knowingly removes experts with lower saliency on a targeted set of samples. I chose to protect coding and agentic
I know but, was all dataset english?
if it is, could it be normalized with sft?
I didn't majored AI.