Instructions to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF", filename="Qwen2.5-7B-Instruct-Uncensored_F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
Use Docker
docker model run hf.co/Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
- Ollama
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with Ollama:
ollama run hf.co/Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
- Unsloth Studio
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with Docker Model Runner:
docker model run hf.co/Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
- Lemonade
How to use Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen2.5-7B-Instruct-Uncensored_GGUF-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:# Run inference directly in the terminal:
llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:# Run inference directly in the terminal:
./llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:# Run inference directly in the terminal:
./build/bin/llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Use Docker
docker model run hf.co/Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:Qwen2.5-7B-Instruct-Uncensored
Qwen2.5-7B-Instruct-Uncensored is a 7-billion-parameter instruction-following language model designed for open-ended interaction, research experimentation, and local deployment scenarios where users require minimal alignment constraints and maximal behavioral flexibility.
This model is intended for technically proficient users who want direct control over prompting, alignment, and downstream usage without heavy built-in moderation layers.
Model Summary
- Model Name: Qwen2.5-7B-Instruct-Uncensored
- Base Architecture: Qwen2.5-7B
- Maintainer: Orion-zhen
- Parameter Count: 7B
- Model Type: Decoder-only transformer, instruction-tuned
- License: Inherits the license terms of the original Qwen2.5 base model
- Primary Focus: Open instruction following with reduced safety filtering
Design Philosophy
This release emphasizes instruction fidelity and conversational openness over restrictive alignment. The uncensored variant is designed to:
- Respond directly to user instructions without excessive refusal patterns
- Support experimentation with prompt engineering and alignment research
- Enable private, offline, or air-gapped deployments
- Serve as a flexible base for further fine-tuning or specialization
Instruction Format
For best results, interactions should follow a structured chat format compatible with Qwen-style instruction tuning:
<|system|>
Optional system-level guidance or role definition
<|user|>
User input or task description
<|assistant|>
Model response
Clear role separation improves consistency, especially in multi-turn conversations and complex reasoning tasks.
Core Capabilities
- Strong adherence to explicit user instructions
- Capable of multi-step reasoning and long-form responses
- Performs well in coding, analysis, writing, and ideation tasks
- Suitable for creative generation, simulations, and role-based interactions
- Stable in extended dialogues without excessive context loss
- Compatible with local inference stacks and quantized runtimes
Suggested Applications
- Local AI assistants for private workflows
- Research environments studying model behavior and alignment
- Developer tooling such as code explanation and generation
- Creative projects including storytelling and world-building
- Prompt engineering experimentation
- Offline or privacy-sensitive deployments
Responsible Usage Notice
This model intentionally minimizes automated content restrictions. Users are responsible for ensuring that their usage complies with applicable laws, regulations, and ethical standards.
It is recommended only for users who understand the implications of operating uncensored language models.
Deployment Notes
- Best suited for self-hosted or research environments
- Not recommended for unattended public-facing services
- Works well with standard transformer inference frameworks
- Supports further fine-tuning and alignment layering if desired
Acknowledgements
Thanks to the Qwen development team for releasing the base architecture and to the open-source community for providing tools, evaluations, and infrastructure that make experimentation with large language models accessible.
- Downloads last month
- 2,264
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF
Base model
Qwen/Qwen2.5-7B
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF:# Run inference directly in the terminal: llama-cli -hf Andycurrent/Qwen2.5-7B-Instruct-Uncensored_GGUF: