Instructions to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF", filename="Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Use Docker
docker model run hf.co/Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Ollama
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with Ollama:
ollama run hf.co/Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Unsloth Studio
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with Docker Model Runner:
docker model run hf.co/Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
- Lemonade
How to use Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Andycurrent/Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF-Q4_K_M
List all available models
lemonade list
Gemma 3 โ 1B IT GLM-4.7 Flash Heretic Uncensored Thinking
This repository hosts Gemma 3 โ 1B IT GLM-4.7 Flash Heretic Uncensored Thinking, a lightweight 1 billionโparameter instruction-tuned model derived from Googleโs Gemma 3 1B IT base.
This variant is optimized for fast inference, structured reasoning behavior, and minimal refusal patterns, while maintaining compatibility with Gemmaโs native instruction format.
Model Overview
- Model Name: Gemma 3 โ 1B IT GLM-4.7 Flash Heretic Uncensored Thinking
- Parameter Count: 1 Billion (1B)
- Base Architecture: Gemma 3
- Base Model: google/gemma-3-1b-it
- Model Type: Instruction-Tuned Causal Language Model
- Context Length: Inherits base model context window
- Primary Language: English
- License: Gemma License (inherits from base model)
- Maintainer / Publisher: DavidAU
What Is This Model?
This model is a modified derivative of Gemma 3 โ 1B IT, configured for:
- Reduced refusal bias compared to default IT alignment
- Enhanced direct-answer behavior
- Stronger short-form reasoning output
- Faster response latency due to compact parameter size
- โFlashโ-style concise and rapid generation
The โHeretic Uncensored Thinkingโ configuration emphasizes:
- Minimal conversational filtering
- Direct completion behavior
- Structured reasoning patterns when prompted
No additional safety layers beyond those present in the base architecture are intentionally introduced.
Key Features & Capabilities
Core Strengths
- Fast inference on consumer GPUs and CPUs
- Low VRAM requirements
- Instruction-following compatibility
- Concise reasoning outputs
- Suitable for lightweight agent pipelines
Performance Characteristics
- Optimized for short-to-medium generation tasks
- Responsive in real-time assistant applications
- Works well in tool-driven or chain-of-thoughtโstyle prompts
- Practical for edge deployments and experimentation
Intended Use Cases
- Lightweight AI assistant
- Prompt engineering experimentation
- Tool-augmented agents
- Rapid-response chat systems
- Local inference environments
- Educational or research workflows
- Controlled โuncensoredโ deployment environments
Chat Template & Prompt Format
This model follows the Gemma instruction format.
For best results:
- Provide explicit system instructions
- Use structured reasoning prompts when needed
- Avoid mixing non-Gemma chat formats
Hardware & Deployment Notes
Due to its 1B parameter size:
- Runs efficiently on 8GB GPUs
- Suitable for CPU inference with quantization
- Ideal for edge devices and low-resource setups
- Compatible with common inference engines supporting Gemma architecture
Quantized versions (GGUF, GPTQ, AWQ, etc.) may be used depending on deployment stack.
Alignment & Safety Notice
This is an โuncensoredโ derivative configuration.
- Reduced refusal behavior compared to standard IT
- Users are responsible for system prompt controls
- Deployment should follow local laws and ethical guidelines
- No additional alignment layers are added by this repository
Use responsibly.
License & Usage Notes
This model inherits the Gemma License from its base model (google/gemma-3-1b-it).
- The Gemma License is a custom license provided by Google
- You must review and comply with the Gemma License terms
- This repository does not change or replace the original licensing terms
Users are responsible for ensuring compliance with all applicable regulations.
Acknowledgements
- Google for the Gemma 3 architecture and base model
- The Hugging Face ecosystem
- Open-source tooling communities supporting lightweight deployment
Community & Support
- Use the Hugging Face Discussions tab for issues and questions
- Community experimentation and benchmarking feedback is welcome
- Downloads last month
- 2,013,655
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit