Instructions to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF", filename="Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Use Docker
docker model run hf.co/Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Ollama
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with Ollama:
ollama run hf.co/Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Unsloth Studio new
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF to start chatting
- Docker Model Runner
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with Docker Model Runner:
docker model run hf.co/Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Lemonade
How to use Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF-Q4_K_M
List all available models
lemonade list
Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking-GGUF
This repository contains Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking-GGUF, a 4B-parameter vision-language instruction-tuned model provided in GGUF format for efficient local inference. The model is designed for open-ended reasoning, multimodal understanding, and minimal alignment constraints, making it suitable for experimentation, research, and advanced local deployments.
Model Summary
- Model ID: Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking-GGUF
- Architecture: Gemma 3 (4B parameters)
- Type: Vision-Language (Text + Image)
- Format: GGUF
- Publisher: mradermacher
- License: Apache 2.0 (inherits from base model)
Key Characteristics
- Multimodal input support (text + images)
- Instruction-tuned for conversational and reasoning tasks
- Reduced content filtering and alignment constraints
- Optimized for local inference runtimes
- Suitable for research, exploration, and advanced user workflows
⚠️ This model is uncensored. Outputs may include sensitive or unfiltered content. Use responsibly.
Supported Use Cases
Text-Based
- Conversational assistants
- Creative writing and storytelling
- Summarization and rewriting
- General reasoning and analysis
Vision + Text
- Image captioning
- Visual question answering
- Scene and object understanding
- Multimodal reasoning tasks
GGUF Compatibility
This model can be used with GGUF-compatible runtimes such as:
llama.cpp- Ollama (GGUF-based builds)
- Other local inference engines supporting GGUF
Performance and supported features may vary depending on runtime and hardware.
Basic Usage Example
Command Line (llama.cpp-style)
./main \
-m Andycurrent/Gemma-3-4B-VL-it-Gemini-Pro-Heretic-Uncensored-Thinking_GGUF_F16.gguf \
-p "Describe the key idea behind multimodal AI models."
Usage Notes
- Provide clear, explicit prompts for best results
- When using images, ensure proper formatting and resolution
- Add moderation or filtering layers if deploying in public-facing applications
Ethical Considerations
Due to its uncensored nature:
- Not recommended for unrestricted public deployment
- Should not be used in safety-critical environments
- Users are responsible for compliance with applicable laws and policies
Acknowledgements
- Gemma base model contributors
- Open-source inference and quantization communities
- Tools and runtimes enabling efficient local LLM deployment
- Downloads last month
- 12,941
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit