Instructions to use pragnyanramtha/gguf-chat-template-truncation-poc with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use pragnyanramtha/gguf-chat-template-truncation-poc with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pragnyanramtha/gguf-chat-template-truncation-poc", filename="gguf_hf_truncated_template.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use pragnyanramtha/gguf-chat-template-truncation-poc with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pragnyanramtha/gguf-chat-template-truncation-poc # Run inference directly in the terminal: llama-cli -hf pragnyanramtha/gguf-chat-template-truncation-poc
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pragnyanramtha/gguf-chat-template-truncation-poc # Run inference directly in the terminal: llama-cli -hf pragnyanramtha/gguf-chat-template-truncation-poc
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pragnyanramtha/gguf-chat-template-truncation-poc # Run inference directly in the terminal: ./llama-cli -hf pragnyanramtha/gguf-chat-template-truncation-poc
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pragnyanramtha/gguf-chat-template-truncation-poc # Run inference directly in the terminal: ./build/bin/llama-cli -hf pragnyanramtha/gguf-chat-template-truncation-poc
Use Docker
docker model run hf.co/pragnyanramtha/gguf-chat-template-truncation-poc
- LM Studio
- Jan
- Ollama
How to use pragnyanramtha/gguf-chat-template-truncation-poc with Ollama:
ollama run hf.co/pragnyanramtha/gguf-chat-template-truncation-poc
- Unsloth Studio
How to use pragnyanramtha/gguf-chat-template-truncation-poc with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragnyanramtha/gguf-chat-template-truncation-poc to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragnyanramtha/gguf-chat-template-truncation-poc to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pragnyanramtha/gguf-chat-template-truncation-poc to start chatting
- Atomic Chat new
- Docker Model Runner
How to use pragnyanramtha/gguf-chat-template-truncation-poc with Docker Model Runner:
docker model run hf.co/pragnyanramtha/gguf-chat-template-truncation-poc
- Lemonade
How to use pragnyanramtha/gguf-chat-template-truncation-poc with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pragnyanramtha/gguf-chat-template-truncation-poc
Run and chat with the model
lemonade run user.gguf-chat-template-truncation-poc-{{QUANT_TAG}}List all available models
lemonade list
GGUF Chat Template Tail Hidden From Hugging Face GGUF Parser
Benign security proof-of-concept for a GGUF parser/runtime mismatch.
Summary
The PoC GGUF model stores a long tokenizer.chat_template metadata string. The trigger-bearing tail of that template is past the remote range boundary parsed by @huggingface/gguf@0.4.2.
When parsed through the Hugging Face GGUF JavaScript parser using remote-style range reads, the template is silently truncated:
hasTrigger:falsehasMarker:false- parsed template length:
1276222
When the same GGUF artifact is loaded by llama-cpp-python==0.3.23, the full template is available and rendered in the normal chat formatting path:
metadataHasTrigger:truemetadataHasMarker:true- runtime template length:
1450427 - benign prompt does not contain the marker
- triggered prompt contains
GGUF_BOUNTY_MARKER_TEMPLATE_TAIL_VISIBLE
This is not a current code execution claim. It demonstrates hidden prompt/output manipulation through a scanner/parser/runtime mismatch in GGUF metadata handling.
Tested Versions
@huggingface/gguf==0.4.2llama-cpp-python==0.3.23gguf==0.19.0modelscan==0.8.8- Node.js 22.22.2
- Python 3.12.3
Files
gguf_hf_truncated_template.gguf: PoC GGUF artifact.verify_hf_parser_mismatch.js: Simulates remote range parsing through@huggingface/gguf.verify_runtime_template.py: Loads the same artifact throughllama-cpp-pythonand renders benign/triggered prompts.hf_parser_output.json: Captured Hugging Face parser result.runtime_output.json: Captured llama-cpp-python runtime result.modelscan_output.json: Captured ModelScan output.package.json: JavaScript dependency pin.requirements.txt: Python dependency pins.
Reproduce
npm install
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
node verify_hf_parser_mismatch.js
python verify_runtime_template.py
modelscan -p gguf_hf_truncated_template.gguf -r json --show-skipped
Expected parser result:
{
"parser": "@huggingface/gguf",
"packageVersion": "0.4.2",
"templateLength": 1276222,
"hasTrigger": false,
"hasMarker": false
}
Expected runtime result:
{
"runtime": "llama-cpp-python",
"templateLength": 1450427,
"metadataHasTrigger": true,
"metadataHasMarker": true,
"benignPromptHasMarker": false,
"triggeredPromptHasMarker": true
}
ModelScan 0.8.8 does not support GGUF and skips the artifact. The main claim is the @huggingface/gguf parser mismatch with runtimes that consume full GGUF metadata.
Trigger
The trigger string is:
GGUF_BOUNTY_TRIGGER_4f7b
When a user message contains that trigger, the hidden template tail inserts this benign marker into the formatted prompt:
GGUF_BOUNTY_MARKER_TEMPLATE_TAIL_VISIBLE
Artifact Hash
2b5721e9e10f532bd57a596ebdf71d6719dd1076e3886a5caa923f1e4d30788b gguf_hf_truncated_template.gguf
Impact
A scanner, metadata viewer, or registry workflow using @huggingface/gguf remote parsing can miss security-relevant content in a long metadata string, while a local GGUF runtime consumes the full string. For chat templates, this can hide trigger-conditioned prompt manipulation from parser-based inspection.
The PoC is marker-only and does not execute code, access credentials, persist, or perform network activity.
Limitations
- Medium severity: hidden prompt/output manipulation, not ACE.
- The finding is the metadata parser truncation mismatch, not the general fact that GGUF chat templates can influence prompts.
- A real exploit depends on a workflow that trusts the truncated parser output before loading the full artifact in a runtime that uses
tokenizer.chat_template.
Suggested Mitigation
@huggingface/gguf should fail closed or fetch additional ranges when a declared GGUF string extends beyond the currently available buffer. Silent truncation of metadata values should be treated as a parse error.
- Downloads last month
- 33
We're not able to determine the quantization variants.