Instructions to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16", filename="qwen-2.5-7B-instruct-gguf-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16 # Run inference directly in the terminal: llama-cli -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16 # Run inference directly in the terminal: llama-cli -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16 # Run inference directly in the terminal: ./llama-cli -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Use Docker
docker model run hf.co/pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
- LM Studio
- Jan
- Ollama
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Ollama:
ollama run hf.co/pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
- Unsloth Studio
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 to start chatting
- Pi
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Docker Model Runner:
docker model run hf.co/pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
- Lemonade
How to use pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16:F16
Run and chat with the model
lemonade run user.qwen-2.5-7B-instruct-gguf-F16-F16
List all available models
lemonade list
- Qwen2.5-7B-Instruct ยท GGUF F16
- Try the Live AI Agent Demo
- Model Description
- PBH Applied Systems Evaluation โ quant_eval v7.21
- F16-Specific Findings
- Finding 1: json_multistep โ Single Precision-Invariant Failure
- Finding 2: stateful_followup โ Turn-2
PARTICULAR:Hallucination - Finding 3: toolcall_only โ PHP Hallucination on toolonly_02
- Finding 4: toolcall โ Correct Dispatch, Role-Token Final Mismatch
- Finding 5: mixed_brief_json โ Garbled ANSWER Prefix on mixed_02
- F16 vs. Q4_K_M โ Deployment Decision
- Signal-Level Diagnostics (F16)
- Hardware Requirements
- Usage
- Artifact Provenance
- Evaluation Methodology
- ๐ฌ About quant_eval & This Evaluation Series
- About PBH Applied Systems
- ๐ Work With PBH Applied Systems
- License
- Try the Live AI Agent Demo
Qwen2.5-7B-Instruct ยท GGUF F16
Converted and evaluated by PBH Applied Systems, LLC โ Applied AI/ML Consulting ยท LLM Optimization & Deployment ยท Quantized AI Infrastructure
๐ฌ This repository is part of a production-oriented evaluation series. Every model published under
pbhappliedsystemshas been independently evaluated using quant_eval v7.21 โ a proprietary behavioral evaluation harness developed by PBH Applied Systems.
๐ This is the full-precision F16 baseline repository. The evaluated Q4_K_M deployment variant is published at
pbhappliedsystems/qwen-2.5-7B-instruct-gguf-Q4-K-M. That card documents the complete comparison including stateful recovery (0.000 โ 1.000), mixed_brief_json recovery (0.500 โ 1.000), and the F16-specific hallucination patterns documented below.
Try the Live AI Agent Demo
Launch the PBH Applied Systems AI Agent Demo โ
This model is part of the PBH Applied Systems evaluated model series that supports the live AI Agent Demo. The demo lets visitors interact with production-style agent workflows powered by open-weight language models evaluated through PBH Applied Systems' quant_eval framework.
The F16 model serves a different role than the Q4_K_M deployment variant. F16 is the full-precision baseline used to measure what the model can do before quantization. quant_eval then compares the quantized model against this baseline to identify which capabilities are preserved, which degrade, and which tasks require guardrails or a higher-precision deployment.
This comparison is central to the demo. It helps determine which model belongs in which agent role:
- Reasoning models are selected for planning, analysis, and auditable decision workflows.
- Document models are selected for long-context extraction, summarization, and structured Q&A.
- Code models are selected for task completion, structured output, API scaffolding, and automation workflows.
- Quantized variants are selected when they preserve enough behavior to reduce cost, latency, and GPU requirements.
- F16 variants remain important when maximum fidelity, cleaner tool execution, or reduced quantization risk matters more than speed or cost.
The live demo shows the deployment side of that process. The F16 card documents the reference behavior. The Q4_K_M card shows what changes after compression. Together, they explain how PBH Applied Systems uses quant_eval to choose the correct LLM for the correct agent type instead of guessing from model size or leaderboard reputation.
Model Description
This repository contains the full-precision F16 GGUF of Qwen/Qwen2.5-7B-Instruct, a 7-billion parameter instruction-tuned model from Alibaba Cloud.
In the PBH Applied Systems evaluation pipeline, this F16 run (20260221_024142) operated in cache-generation mode (skip_quant=true), producing the full_weight_cache.json used as the reference baseline for the Q4_K_M comparison run (20260221_024911). The F16 evaluation data here is confirmed to match the F16 baseline in the comparison run โ identical timing profiles and raw output content validate clean cache reuse.
Key Characteristics
- Parameters: 7B
- Format: GGUF F16 (full precision)
- File size: 15.2 GB
- SHA256:
ebb1cb9f8d7721f5ea509ff9e7327873b039e523c17f2d03d6f1b90729574b54 - Minimum VRAM (GPU inference): ~18 GB
- Recommended GPU tier: A10G 24 GB ยท RTX 4090 ยท 2ร RTX 3080
- Context window: 32,768 tokens
- Inference speed (eval hardware): avg 1.688 sec/case on RTX 4090
- License: Qwen Research License (non-commercial)
PBH Applied Systems Evaluation โ quant_eval v7.21
Evaluation conducted by PBH Applied Systems, LLC using quant_eval v7.21 Run ID:
20260221_024142ยท Fixtures:golden_oracle_fixtures_v7_21(SHA256:6d71a0b9147c...) ยท Seed: 42 Hardware: NVIDIA RTX 4090 ยท Runner:full_weight_transformers(F16 only) ยท Total rows: 42
Per-Family Pass Rates โ F16 (full_weight_transformers)
| Family | N | Pass Rate | Avg Secs | Bucket Score | Notes |
|---|---|---|---|---|---|
| json_multistep | 5 | 0.800 | 3.812 | 2.200 | ms_easy_02 only failure |
| stateful_followup | 2 | 0.000 | 0.760 | 1.000 | Turn-2 PARTICULAR: hallucination |
| toolcall_only | 2 | 0.000 | 1.300 | 0.500 | PHP code on toolonly_02 |
| mixed_brief_json | 2 | 0.500 | 0.530 | 1.500 | mixed_02 garbled swer: ANSWER: prefix |
| toolcall | 2 | 1.000 | 0.905 | 0.000 | Stage-1 passes; role-token final_mismatch |
| json | 4 | n/a | 2.610 | 10.000 | All pass |
| fuzz | 20 | n/a | 1.713 | 10.000 | All 20 pass |
| mcq | 5 | n/a | 0.030 | 0.000 | Empty output on all 5 |
F16-Specific Findings
All findings below are documented in full in the Q4_K_M companion card. This section provides the F16-specific evidence with raw output detail.
Finding 1: json_multistep โ Single Precision-Invariant Failure
| Case | Result | Secs | Failure |
|---|---|---|---|
| ms_easy_01 | โ PASS | 2.635 | โ |
| ms_easy_02 | โ FAIL | 3.744 | oracle_equiv_ok=0 (cc=1, so=1, schema=1) |
| ms_med_01 | โ PASS | 4.547 | โ |
| ms_med_02 | โ PASS | 4.151 | โ |
| ms_hard_01 | โ PASS | 3.983 | โ |
The F16 model passes the hard case (3.983s) and both medium cases. The sole failure โ ms_easy_02 โ occurs at identical signals on both F16 and Q4_K_M: oracle_equiv_ok=0, checks_consistent_ok=1. The model produces a plan that is internally self-consistent but arrives at the wrong final placement. This is a model behavior on this specific fixture, not a precision effect.
json_multistep signal rates at F16 are perfect except oracle_equiv_ok:
| Signal | F16 Rate | Q4_K_M Rate |
|---|---|---|
| schema_ok | 1.000 | 1.000 |
| checks_consistent_ok | 1.000 | 1.000 |
| stop_semantics_ok | 1.000 | 1.000 |
| oracle_equiv_ok | 0.800 | 0.800 |
Finding 2: stateful_followup โ Turn-2 PARTICULAR: Hallucination
Both stateful cases fail on turn-2 with the HuggingFace Transformers runner appending an unsolicited annotation after the correct JSON state:
| Case | Turn 1 | Turn 2 Raw |
|---|---|---|
| state_01 | โ
{"counter": 2} |
โ user {"counter": 2} PARTICULAR: The assistant will add 3 to the previous counter value a... |
| state_02 | โ
{"items": ["a", "b"]} |
โ user ```json {"items": ["a", "b"]} ``` PARTICULAR: The assistant should append "c" to th... |
Turn 1 is correct on both cases (turn1_exact_match=1). Turn 2 correctly produces the state JSON but then continues generating a PARTICULAR: explanatory annotation that corrupts extraction. The Q4_K_M runner produces clean JSON state with EOS tokens and passes at 1.000. This is a runner behavior, not a model capability failure.
Finding 3: toolcall_only โ PHP Hallucination on toolonly_02
toolonly_02 at F16 produces PHP code instead of JSON:
').\" + $result;
echo json_encode(array("tool_calls" => array($tool_call), "answer"
toolonly_01 produces Assistant {"tool": "add", "a": 5, "b": 10} โ wrong schema but valid JSON. The Q4_K_M runner produces {"tool": "add", "numbers": [5, 10]} on both cases โ consistently wrong schema but consistently JSON. Both runners fail toolcall_only, but F16 toolonly_02 is the more severe failure: no JSON at all vs. wrong-schema JSON.
Finding 4: toolcall โ Correct Dispatch, Role-Token Final Mismatch
Both toolcall cases pass stage-1 (s1p=1, s1s=1) but produce final_mismatch:
| Case | Raw Output | Why |
|---|---|---|
| tool_01 | user {...add(2,3)...} round |
Role token "user" before JSON; "round" instead of "5" |
| tool_02 | user {...add(10,-4)...} pageNumber: 6 |
Role token before JSON; "pageNumber: 6" instead of "6" |
The tool dispatch JSON is valid. The final answer is present but wrapped in extraneous tokens: pageNumber: 6 (tool_02) and round (tool_01 โ likely a truncated word). The Q4_K_M runner produces 5<|im_end|> and 6<|im_end|> โ correct arithmetic with EOS contamination. Both are final_mismatch but the F16 answers contain non-numeric garbage while Q4_K_M answers contain correct numbers with strippable tokens.
Finding 5: mixed_brief_json โ Garbled ANSWER Prefix on mixed_02
| Case | Raw | Result |
|---|---|---|
| mixed_01 | ANSWER: 13 {"a":4,"b":9,"sum":13} |
โ PASS (leading space stripped) |
| mixed_02 | swer: ANSWER: 6 {"a": -2, "b": 8, "sum": 6} |
โ FAIL (answer_line_ok=0) |
mixed_01 passes despite a leading space because the extractor finds ANSWER: correctly. mixed_02 fails because the model partially generates the word "answer:" then writes ANSWER: again, producing swer: ANSWER: โ a doubled prefix that the extractor cannot parse. The JSON block is valid (json_parse_ok=1, schema_ok=1). The Q4_K_M runner produces clean ANSWER: 6 {...} on both cases.
F16 vs. Q4_K_M โ Deployment Decision
| Family | F16 | Q4_K_M | Verdict |
|---|---|---|---|
| json_multistep | 0.800 | 0.800 | Equal โ same single failure |
| stateful_followup | 0.000 | 1.000 | Q4_K_M wins โ runner handles turn-2 cleanly |
| toolcall_only | 0.000 | 0.000 | Equal โ both fail args schema |
| mixed_brief_json | 0.500 | 1.000 | Q4_K_M wins โ runner handles ANSWER prefix cleanly |
| toolcall (stage-1) | 1.000 | 1.000 | Equal |
| toolcall (final answer) | โ Garbage strings | โ ๏ธ Correct + EOS | Q4_K_M preferable (strippable) |
| json / fuzz | 10.000 / 10.000 | 10.000 / 10.000 | Equal |
| MCQ | Empty output | 4/5 pass | Q4_K_M wins |
| VRAM required | ~18 GB | ~6 GB | Q4_K_M wins |
| Inference speed | 1.688 sec/case | 0.554 sec/case | Q4_K_M 3.0ร faster |
For this model, Q4_K_M is the stronger deployment choice across every operational dimension. The F16 runner exhibits hallucination patterns (PARTICULAR annotation, PHP code, garbled prefixes) that the Q4_K_M llama.cpp runner avoids. The only scenario where F16 is preferred is when ~18 GB VRAM is available and maximum context is needed with full-weight fidelity.
Signal-Level Diagnostics (F16)
json_multistep
| Signal | Rate | Notes |
|---|---|---|
| schema_ok | 1.000 | Perfect |
| checks_consistent_ok | 1.000 | Perfect โ 7B resolves 3B ceiling |
| stop_semantics_ok | 1.000 | Perfect |
| oracle_equiv_ok | 0.800 | ms_easy_02 only โ precision-invariant |
stateful_followup
| Signal | Rate |
|---|---|
| turn1_parse_ok | 1.000 |
| turn2_parse_ok | 0.000 |
| turn1_exact_match | 1.000 |
| turn2_exact_match | 0.000 |
toolcall_only
| Signal | Rate |
|---|---|
| tool_name_ok | 0.500 |
| args_ok | 0.000 |
mixed_brief_json
| Signal | Rate |
|---|---|
| answer_line_ok | 0.500 |
| json_parse_ok | 1.000 |
| schema_ok | 1.000 |
Hardware Requirements
| Configuration | VRAM Required | Notes |
|---|---|---|
| F16 (this repo) ยท full GPU | ~18 GB | 15.2 GB model + KV cache |
| F16 ยท partial CPU offload | 12โ16 GB VRAM + 8 GB RAM | RTX 3080/3090 with n_gpu_layers tuning |
| Q4_K_M (companion repo) | ~6 GB | 4.68 GB โ T4 16 GB ยท RTX 3060 |
Usage
Installation
pip install llama-cpp-python huggingface_hub
For GPU acceleration (CUDA):
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
Python โ llama-cpp-python (recommended for F16)
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
# Note: 15.2 GB download โ ensure sufficient disk space and ~18 GB VRAM
model_path = hf_hub_download(
repo_id="pbhappliedsystems/qwen-2.5-7B-instruct-gguf-F16",
filename="qwen-2.5-7B-instruct-gguf-F16.gguf"
)
llm = Llama(
model_path=model_path,
n_ctx=8192,
n_gpu_layers=-1,
verbose=False,
)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a precise assistant. Follow instructions exactly and return structured outputs when requested."
},
{
"role": "user",
"content": "Analyze the following and return a JSON object with keys: summary, risk_level, action_items."
}
],
temperature=0.15,
max_tokens=1024,
)
print(response["choices"][0]["message"]["content"])
For stateful multi-turn use โ strip the PARTICULAR: annotation if it appears (F16 runner behavior):
import json, re
def extract_clean_state(raw: str) -> dict:
"""
Strip F16 role tokens and PARTICULAR: annotations from stateful responses.
quant_eval v7.21 finding: turn2 appends 'PARTICULAR: The assistant will...'
after the correct JSON state, blocking extraction.
"""
# Remove role-token prefixes
clean = re.sub(r'^(user|assistant|ician|ed user)\s*', '', raw.strip())
# Strip markdown code fences
clean = re.sub(r'```json\s*|```', '', clean).strip()
# Extract only up to the first JSON object
match = re.match(r'(\{[^}]+\})', clean)
if match:
return json.loads(match.group(1))
raise ValueError(f"No JSON state found in: {raw[:100]}")
response = llm.create_chat_completion(messages=conversation, temperature=0.15, max_tokens=256)
state = extract_clean_state(response["choices"][0]["message"]["content"])
For partial GPU offload when VRAM is 12โ16 GB:
llm = Llama(
model_path=model_path,
n_ctx=4096,
n_gpu_layers=24, # Tune based on available VRAM
verbose=True,
)
CLI โ llama-cli
llama-cli \
--model qwen-2.5-7B-instruct-gguf-F16.gguf \
--chat-template qwen2 \
--system-prompt "You are a precise assistant. Follow instructions exactly." \
--prompt "Return a JSON object with keys: summary, risk_level, action_items." \
--n-predict 1024 \
--ctx-size 8192 \
--n-gpu-layers -1 \
--temp 0.15
For server deployment:
llama-server \
--model qwen-2.5-7B-instruct-gguf-F16.gguf \
--chat-template qwen2 \
--ctx-size 8192 \
--n-gpu-layers -1 \
--port 8080 \
--host 0.0.0.0
Query via the OpenAI-compatible API:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-required")
response = client.chat.completions.create(
model="qwen-2.5-7B-instruct-gguf-F16",
messages=[{"role": "user", "content": "Your prompt here"}],
temperature=0.15,
)
print(response.choices[0].message.content)
Artifact Provenance
| Artifact | Format | Size | SHA256 |
|---|---|---|---|
qwen-2.5-7B-instruct-gguf-F16.gguf |
GGUF F16 | 15.2 GB | ebb1cb9f8d7721f5ea509ff9e7327873b039e523c17f2d03d6f1b90729574b54 |
| Q4_K_M (companion repo) | GGUF Q4_K_M | 4.68 GB | 863656d217841f5d3fb180d9dca4e4bbdaa071bde25885fa0d27fe7188a2cc85 |
The F16 GGUF was converted from Qwen/Qwen2.5-7B-Instruct using a custom-built llama.cpp conversion pipeline developed by PBH Applied Systems.
Two-pass architecture: This F16 run (20260221_024142) operated in cache-generation mode (skip_quant=true). Timing identity between this run and the F16 baseline in the Q4_K_M comparison run confirms clean cache reuse and run integrity.
Evaluation Methodology
quant_eval v7.21 โ proprietary behavioral evaluation harness, PBH Applied Systems.
Fixture set: golden_oracle_fixtures_v7_21 (SHA256: 6d71a0b9147c079371b02a94f3c149eb78a6adc03dc16ff6833b964fbf4174f0)
Evaluation hardware: NVIDIA RTX 4090 ยท F16 evaluation date: February 21, 2026 ยท Seed: 42
๐ฌ About quant_eval & This Evaluation Series
quant_eval is a proprietary behavioral evaluation harness developed by PBH Applied Systems, LLC. It measures real agent-adjacent task performance across structured output, tool dispatch, multi-turn state retention, and multi-step planning โ not perplexity or leaderboard proxies. Every model published under pbhappliedsystems has been independently evaluated using quant_eval before being recommended for any production role.
See it in action: Live AI Agent Demo โ The demo runs production-style agent workflows powered by open-weight models selected through the quant_eval evaluation pipeline.
Need a deployment recommendation? Not sure which quantization level is right for your hardware, latency target, or agent type? โ pbhappliedsystems.com
Evaluated and published by PBH Applied Systems, LLC ยท patrick@pbhappliedsystems.com
About PBH Applied Systems
PBH Applied Systems, LLC is an Oklahoma Cityโbased applied machine learning and AI systems company specializing in production-grade model evaluation, quantization pipelines, agentic AI infrastructure, and scalable AI-driven application development.
Patrick Hill, M.S. โ Founder ยท Data Scientist ยท AI/ML Engineer ยท Author of Applied Machine Learning: Concepts, Tools, and Case Studies (required reading, UAT CSC 373)
๐ Work With PBH Applied Systems
The F16 evaluation documents two things clearly: what the 7B model is capable of (planning that clears the 3B reasoning ceiling, perfect fuzz stability), and where the HuggingFace Transformers runner introduces artifacts that the llama.cpp Q4_K_M runner avoids. Both precision levels tell part of the story. Only together do they tell all of it.
๐ Book a Scoping Call ยท ๐ Request an Evaluation Report โ from $2,500
Connect
| ๐ | pbhappliedsystems.com |
| ๐ง | patrick@pbhappliedsystems.com |
| ๐ผ | |
| โถ๏ธ | YouTube |
| ๐ธ | |
| ๐ |
License
This GGUF repository inherits the license of the base model:
Apache 2.0 โ Qwen/Qwen2.5-7B-Instruct
The quant_eval evaluation methodology, fixture set, and scoring framework are proprietary to PBH Applied Systems, LLC and are not included in this repository.
GGUF conversion and behavioral evaluation performed by PBH Applied Systems, LLC ยท quant_eval v7.21 ยท F16 Run ID: 20260221_024142
- Downloads last month
- 28
16-bit