Instructions to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF", dtype="auto") - llama-cpp-python
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF", filename="Qwen2.5-14B-Base-Heretic.i1-IQ1_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Ollama:
ollama run hf.co/mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
- Unsloth Studio
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF to start chatting
- Pi
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Docker Model Runner:
docker model run hf.co/mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
- Lemonade
How to use mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mradermacher/Qwen2.5-14B-Base-Heretic-i1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen2.5-14B-Base-Heretic-i1-GGUF-Q4_K_M
List all available models
lemonade list
can i use it?
hallo! i have already used many modes before and there was always something puting me off how they are, not the reasoning and stuff but the underlying carackter of the ai, i saw this ai here and i want to make my own version of it with training it. is it still close to the original model without much charackterisation with just obliteranting the set filters? it will take a long time i think but i just want a pretty good base model which i can use to make my own ai, i would apreciate an answewr and maybe a bit of guidance. until then i wish you a good week/ holliday today. greeting from germany,
no named
First, we quant, it aint our model, its this persons https://huggingface.co/TitleOS/Qwen2.5-14B-Base-Heretic and second, things like this are usualy allowed since hugging face is opensourced, but i can try help anyways!
Booth TitleOS/Qwen2.5-14B-Base-Heretic and Qwen/Qwen2.5-14B from which it is based are Apache 2.0 licensed and so you should legally be able to use them for further finetuning. You obviously can't finetune the GGUFs but you can finetune TitleOS/Qwen2.5-14B-Base-Heretic` with any finetuning framework you like. I personally recommend axolotl. I wish you a great holiday as well and greetings from Switzerland.
hallo! i have already used many modes before and there was always something puting me off how they are, the underlying carackter of the ai
Are you using any custom definitions or the default of whatever program you're using?
The default tends to be 'you are a great assistant and are helpful to the user' or something. There's other definitions. You can even tell it to be a lazy catgirl maid who meows a lot, and the personality and output completely changes.
thank you for the answers, i will probably not upload the model on here because i dont want to get smited with mca takedowns or something similar, so just for personal use. i'm still looking for a good framework that will work with this model, if you say axolotl will work them i will probably use that one, are there recomendations on th eparameters for training and the min/max size for the finetuning data?
Are you using any custom definitions or the default of whatever program you're using?
The default tends to be 'you are a great assistant and are helpful to the user' or something. There's other definitions. You can even tell it to be a lazy catgirl maid who meows a lot, and the personality and output completely changes.
i'm just using oobabooga and tried a few things just with instruct and then chat, with chat i use my own template of a chartackter i have made and it works fine, i just want to start getting into making my own loras and something that is a bit tailored towards myself
hallo! i have already used many modes before and there was always something puting me off how they are, the underlying carackter of the ai
Are you using any custom definitions or the default of whatever program you're using?
The default tends to be 'you are a great assistant and are helpful to the user' or something. There's other definitions. You can even tell it to be a lazy catgirl maid who meows a lot, and the personality and output completely changes.
the closing was unwanted
i'm just using oobabooga and tried a few things just with instruct and then chat, with chat i use my own template of a chartackter i have made and it works fine, i just want to start getting into making my own loras and something that is a bit tailored towards myself
Mhmm... Yeah that's where i started too. Moved to KoboldCPP which i think is a bit better, along with SillyTavern. Even KoboldCPP's default interface does okay if you don't care much about logs.
Within OobaBooga there's a tab to handle character profiles, which accepts SillyTavern png files. Save the image (it contains a json definition for a bot, you can of course also just get the raw json file too) and import it. There's plenty of places that have character/RP cards, but basic changed assistants can't hurt.
source: https://characterhub.org/characters/ShyChu/aster-a-i-assistant-c91a400eaeb5
You can also with an LLM tell it the tone and feeling and what you want, and then have it help you write a definition too. Though some models tend to be way too terse, don't follow instructions or like huihui models tend to try and be too poetic and flowery in their short 10-20 word 10 line responses. I've found Gemma4 models to be very impressive.