Instructions to use tobil/qmd-query-expansion-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use tobil/qmd-query-expansion-1.7B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tobil/qmd-query-expansion-1.7B", filename="qmd-query-expansion-1.7B-Q4_0.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use tobil/qmd-query-expansion-1.7B with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf tobil/qmd-query-expansion-1.7B:Q4_K_M # Run inference directly in the terminal: llama cli -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf tobil/qmd-query-expansion-1.7B:Q4_K_M # Run inference directly in the terminal: llama cli -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tobil/qmd-query-expansion-1.7B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tobil/qmd-query-expansion-1.7B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Use Docker
docker model run hf.co/tobil/qmd-query-expansion-1.7B:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use tobil/qmd-query-expansion-1.7B with Ollama:
ollama run hf.co/tobil/qmd-query-expansion-1.7B:Q4_K_M
- Unsloth Studio
How to use tobil/qmd-query-expansion-1.7B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tobil/qmd-query-expansion-1.7B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tobil/qmd-query-expansion-1.7B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tobil/qmd-query-expansion-1.7B to start chatting
- Pi
How to use tobil/qmd-query-expansion-1.7B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tobil/qmd-query-expansion-1.7B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tobil/qmd-query-expansion-1.7B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tobil/qmd-query-expansion-1.7B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tobil/qmd-query-expansion-1.7B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use tobil/qmd-query-expansion-1.7B with Docker Model Runner:
docker model run hf.co/tobil/qmd-query-expansion-1.7B:Q4_K_M
- Lemonade
How to use tobil/qmd-query-expansion-1.7B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tobil/qmd-query-expansion-1.7B:Q4_K_M
Run and chat with the model
lemonade run user.qmd-query-expansion-1.7B-Q4_K_M
List all available models
lemonade list
Update qmd-query-expansion-1.7B with latest SFT weights
Browse files- README.md +6 -286
- config.json +1 -1
- generation_config.json +1 -1
- model.safetensors +1 -1
- tokenizer_config.json +16 -1
README.md
CHANGED
|
@@ -1,293 +1,13 @@
|
|
| 1 |
-
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
base_model: Qwen/Qwen3-1.7B
|
| 4 |
-
tags:
|
| 5 |
-
- query-expansion
|
| 6 |
-
- search
|
| 7 |
-
- retrieval
|
| 8 |
-
- rag
|
| 9 |
-
- hybrid-search
|
| 10 |
-
- dspy
|
| 11 |
-
- gepa
|
| 12 |
-
language:
|
| 13 |
-
- en
|
| 14 |
-
pipeline_tag: text-generation
|
| 15 |
-
datasets:
|
| 16 |
-
- custom
|
| 17 |
-
---
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
A Qwen3-1.7B model finetuned for **query expansion** in hybrid search systems (RAG). Expands user queries into retrieval-optimized variations for both sparse (BM25) and dense (vector) search.
|
| 22 |
-
|
| 23 |
-
**Repository**: [github.com/tobi/qmd](https://github.com/tobi/qmd)
|
| 24 |
-
|
| 25 |
-
## What This Model Does
|
| 26 |
-
|
| 27 |
-
Given a search query, generates 7 expansions:
|
| 28 |
-
- **1 hyde**: A hypothetical document snippet (50-200 chars) that would answer the query
|
| 29 |
-
- **3 lex**: Keyword phrases (2-5 words) optimized for BM25/sparse search
|
| 30 |
-
- **3 vec**: Natural language sentences (15-30 words) for vector/dense search
|
| 31 |
-
|
| 32 |
-
This improves recall in hybrid retrieval systems by matching both exact keywords and semantic meaning.
|
| 33 |
|
| 34 |
## Prompt Format
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
```
|
| 39 |
-
Expand this search query:
|
| 40 |
-
<query>
|
| 41 |
-
```
|
| 42 |
-
|
| 43 |
-
**Example Input**:
|
| 44 |
-
```
|
| 45 |
-
Expand this search query:
|
| 46 |
-
postgresql jsonb indexing
|
| 47 |
-
```
|
| 48 |
-
|
| 49 |
-
**Example Output**:
|
| 50 |
-
```
|
| 51 |
-
hyde: PostgreSQL JSONB supports GIN indexes for fast key lookups and containment queries with @> operator.
|
| 52 |
-
lex: postgresql jsonb gin index
|
| 53 |
-
lex: postgres json indexing strategies
|
| 54 |
-
lex: jsonb index optimization postgresql
|
| 55 |
-
vec: How do I create efficient GIN indexes on JSONB columns in PostgreSQL?
|
| 56 |
-
vec: Best practices for indexing JSON data in PostgreSQL databases.
|
| 57 |
-
vec: Performance comparison of GIN vs BTREE indexes for JSONB fields.
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
## Usage
|
| 61 |
-
|
| 62 |
-
### With vLLM (Recommended)
|
| 63 |
-
|
| 64 |
-
```bash
|
| 65 |
-
# Start server
|
| 66 |
-
vllm serve tobil/qmd-query-expansion-1.7B --port 8000
|
| 67 |
-
|
| 68 |
-
# Query
|
| 69 |
-
curl -s http://localhost:8000/v1/chat/completions \
|
| 70 |
-
-H "Content-Type: application/json" \
|
| 71 |
-
-d '{
|
| 72 |
-
"model": "tobil/qmd-query-expansion-1.7B",
|
| 73 |
-
"messages": [{"role": "user", "content": "Expand this search query:\npostgresql jsonb indexing"}],
|
| 74 |
-
"temperature": 0.7,
|
| 75 |
-
"max_tokens": 400
|
| 76 |
-
}' | jq -r '.choices[0].message.content'
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
### With Transformers
|
| 80 |
-
|
| 81 |
-
```python
|
| 82 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 83 |
-
|
| 84 |
-
model = AutoModelForCausalLM.from_pretrained("tobil/qmd-query-expansion-1.7B")
|
| 85 |
-
tokenizer = AutoTokenizer.from_pretrained("tobil/qmd-query-expansion-1.7B")
|
| 86 |
-
|
| 87 |
-
messages = [{"role": "user", "content": "Expand this search query:\nReact hooks tutorial"}]
|
| 88 |
-
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 89 |
-
inputs = tokenizer(text, return_tensors="pt")
|
| 90 |
-
outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
|
| 91 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
### With llama.cpp (GGUF)
|
| 95 |
-
|
| 96 |
-
```bash
|
| 97 |
-
# Download GGUF (Q8_0 quantized, 2.1GB)
|
| 98 |
-
huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q8_0.gguf
|
| 99 |
-
|
| 100 |
-
# Run
|
| 101 |
-
./llama-cli -m qmd-query-expansion-1.7B-Q8_0.gguf \
|
| 102 |
-
-p "Expand this search query:\nkubernetes vs docker" \
|
| 103 |
-
--temp 0.7 -n 400
|
| 104 |
-
```
|
| 105 |
-
|
| 106 |
-
## Output Parsing
|
| 107 |
-
|
| 108 |
-
The model outputs in line format. Parse with:
|
| 109 |
-
|
| 110 |
-
```python
|
| 111 |
-
import re
|
| 112 |
-
|
| 113 |
-
def parse_expansions(text: str) -> list[dict]:
|
| 114 |
-
"""Parse line-based expansion output into structured format."""
|
| 115 |
-
expansions = []
|
| 116 |
-
|
| 117 |
-
# Remove thinking tags if present (Qwen3 feature)
|
| 118 |
-
text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
|
| 119 |
-
|
| 120 |
-
for line in text.strip().split('\n'):
|
| 121 |
-
line = line.strip()
|
| 122 |
-
match = re.match(r'^(hyde|lex|vec)\s*:\s*(.+)$', line, re.IGNORECASE)
|
| 123 |
-
if match:
|
| 124 |
-
expansions.append({
|
| 125 |
-
"type": match.group(1).lower(),
|
| 126 |
-
"value": match.group(2).strip()
|
| 127 |
-
})
|
| 128 |
-
|
| 129 |
-
return expansions
|
| 130 |
-
|
| 131 |
-
# Example
|
| 132 |
-
output = """hyde: PostgreSQL JSONB supports GIN indexes for fast queries.
|
| 133 |
-
lex: postgresql jsonb gin index
|
| 134 |
-
lex: postgres json indexing
|
| 135 |
-
lex: jsonb optimization
|
| 136 |
-
vec: How to create GIN indexes on JSONB columns?
|
| 137 |
-
vec: Best practices for PostgreSQL JSON indexing.
|
| 138 |
-
vec: JSONB vs JSON performance comparison."""
|
| 139 |
-
|
| 140 |
-
expansions = parse_expansions(output)
|
| 141 |
-
# [{"type": "hyde", "value": "PostgreSQL JSONB supports..."}, ...]
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
## Training Details
|
| 145 |
-
|
| 146 |
-
### Method: GEPA Distillation
|
| 147 |
-
|
| 148 |
-
1. **Teacher Model**: GPT-4o-mini with GEPA-optimized prompt
|
| 149 |
-
2. **Prompt Optimization**: DSPy's GEPA (Grounded Example-based Prompt Adaptation) automatically evolved the teacher prompt over 34 iterations to reach 87.7% on our scoring metric
|
| 150 |
-
3. **Distillation**: Generated 500+ high-quality training examples from teacher
|
| 151 |
-
4. **Student Training**: SFT with LoRA on Qwen3-1.7B, 3 epochs
|
| 152 |
-
|
| 153 |
-
### Key Learnings
|
| 154 |
-
|
| 155 |
-
#### 1. Hyde-First Ordering Matters
|
| 156 |
-
|
| 157 |
-
Generating the hypothetical document (hyde) first provides context that improves lex and vec quality. The hyde acts as an "anchor" that grounds subsequent expansions.
|
| 158 |
-
|
| 159 |
-
```
|
| 160 |
-
✅ Good: hyde first, then lex uses hyde context
|
| 161 |
-
hyde: Kubernetes orchestrates containers at scale with auto-scaling...
|
| 162 |
-
lex: kubernetes container orchestration # informed by hyde
|
| 163 |
-
|
| 164 |
-
❌ Bad: lex without context
|
| 165 |
-
lex: container management # too generic
|
| 166 |
-
```
|
| 167 |
-
|
| 168 |
-
#### 2. Entity Preservation is Critical
|
| 169 |
-
|
| 170 |
-
Named entities (brands, products, technical terms) must appear in **every** lex expansion. Missing entities tanks BM25 recall.
|
| 171 |
-
|
| 172 |
-
```
|
| 173 |
-
Query: "iPhone 15 vs Samsung S24"
|
| 174 |
-
|
| 175 |
-
✅ Good lex:
|
| 176 |
-
- "iPhone 15 Samsung S24 comparison"
|
| 177 |
-
- "iPhone 15 vs Samsung S24 specs"
|
| 178 |
-
- "Samsung S24 iPhone 15 camera"
|
| 179 |
-
|
| 180 |
-
❌ Bad lex:
|
| 181 |
-
- "smartphone comparison" # missing entities!
|
| 182 |
-
- "phone camera review" # missing entities!
|
| 183 |
-
```
|
| 184 |
-
|
| 185 |
-
#### 3. Simple Prompts Win for Small Models
|
| 186 |
-
|
| 187 |
-
The teacher used a complex DSPy signature format with structured sections. But the small model performed better with the simple training format:
|
| 188 |
-
|
| 189 |
-
```
|
| 190 |
-
✅ Use this (matches training):
|
| 191 |
-
"Expand this search query:\n{query}"
|
| 192 |
-
|
| 193 |
-
❌ Not this (DSPy signature format):
|
| 194 |
-
"## Inputs\n### query\n{query}\n## Generated Outputs..."
|
| 195 |
-
```
|
| 196 |
-
|
| 197 |
-
Complex prompts caused the small model to "leak" instruction fragments into outputs.
|
| 198 |
-
|
| 199 |
-
#### 4. Line Format > JSON for Small Models
|
| 200 |
-
|
| 201 |
-
Small models struggle with reliable JSON generation. Line-based format is more robust:
|
| 202 |
-
|
| 203 |
-
```
|
| 204 |
-
✅ Reliable:
|
| 205 |
-
hyde: Some text here
|
| 206 |
-
lex: keyword phrase
|
| 207 |
-
vec: A full sentence.
|
| 208 |
-
|
| 209 |
-
❌ Unreliable for 1.7B:
|
| 210 |
-
[{"type": "hyde", "value": "..."}, ...]
|
| 211 |
-
```
|
| 212 |
-
|
| 213 |
-
#### 5. GEPA Prompt Evolution
|
| 214 |
-
|
| 215 |
-
GEPA automatically discovered these improvements to the teacher prompt:
|
| 216 |
-
- Explicit examples for edge cases (ambiguous queries like "pin")
|
| 217 |
-
- Emphasis on entity preservation with concrete failure cases
|
| 218 |
-
- Factual grounding examples (Louvre hours, GPS navigation steps)
|
| 219 |
-
- Score targets ("aim for 78-84%") to calibrate quality
|
| 220 |
-
|
| 221 |
-
### Training Configuration
|
| 222 |
-
|
| 223 |
-
```yaml
|
| 224 |
-
base_model: Qwen/Qwen3-1.7B
|
| 225 |
-
method: SFT with LoRA
|
| 226 |
-
lora_r: 64
|
| 227 |
-
lora_alpha: 128
|
| 228 |
-
learning_rate: 2e-4
|
| 229 |
-
epochs: 3
|
| 230 |
-
batch_size: 4
|
| 231 |
-
gradient_accumulation: 4
|
| 232 |
-
warmup_ratio: 0.1
|
| 233 |
-
scheduler: cosine
|
| 234 |
-
```
|
| 235 |
-
|
| 236 |
-
### Metrics
|
| 237 |
-
|
| 238 |
-
| Metric | Value |
|
| 239 |
-
|--------|-------|
|
| 240 |
-
| Final Loss | 0.64 |
|
| 241 |
-
| Token Accuracy | 84.7% |
|
| 242 |
-
| Eval Score Range | 80-96% |
|
| 243 |
-
| Training Time | ~7 min (RTX 4090) |
|
| 244 |
-
|
| 245 |
-
## Scoring Rubric
|
| 246 |
-
|
| 247 |
-
Our evaluation metric scores expansions on:
|
| 248 |
-
|
| 249 |
-
1. **Structure** (7 items: 1 hyde, 3 lex, 3 vec)
|
| 250 |
-
2. **Entity Preservation** (all query entities in every lex)
|
| 251 |
-
3. **No Verbatim Echo** (lex shouldn't just repeat the query)
|
| 252 |
-
4. **Hyde Quality** (50-200 chars, informative)
|
| 253 |
-
5. **Vec Quality** (15-30 words, semantic variation)
|
| 254 |
-
6. **Hyde-Lex-Vec Coherence** (lex/vec should build on hyde)
|
| 255 |
-
|
| 256 |
-
## Limitations
|
| 257 |
-
|
| 258 |
-
- Trained on English queries only
|
| 259 |
-
- May hallucinate facts in hyde (use for retrieval, not as ground truth)
|
| 260 |
-
- Optimized for general knowledge queries; domain-specific queries may need domain-adapted models
|
| 261 |
-
- Qwen3's `<think>` tags sometimes appear (strip them in post-processing)
|
| 262 |
-
|
| 263 |
-
## Files
|
| 264 |
-
|
| 265 |
-
### Safetensors (for transformers/vLLM)
|
| 266 |
-
- `model.safetensors` - Full precision weights (4.1GB)
|
| 267 |
-
|
| 268 |
-
### GGUF Quantizations (for llama.cpp/Ollama)
|
| 269 |
-
|
| 270 |
-
| Quant | Size | BPW | Eval Score | Use Case |
|
| 271 |
-
|-------|------|-----|------------|----------|
|
| 272 |
-
| Q8_0 | 2.1GB | 8.5 | 87% | Max quality |
|
| 273 |
-
| Q6_K | 1.6GB | 6.6 | 89% | Good balance |
|
| 274 |
-
| Q5_K_M | 1.4GB | 5.7 | 89% | Recommended |
|
| 275 |
-
| Q4_K_M | 1.2GB | 4.8 | 92% | **Best value** |
|
| 276 |
-
| Q4_0 | 1.2GB | 4.5 | 95% | Smallest |
|
| 277 |
-
|
| 278 |
-
**Results:** All quantizations perform excellently on this structured generation task. The eval scores show minimal quality degradation even at Q4_0 - the task (generating hyde/lex/vec expansions) is simple enough that aggressive quantization doesn't hurt. **Q4_K_M is recommended** for the best size/quality tradeoff.
|
| 279 |
-
|
| 280 |
-
## Citation
|
| 281 |
|
| 282 |
-
``
|
| 283 |
-
@misc{qmd-query-expansion,
|
| 284 |
-
title={QMD Query Expansion Model},
|
| 285 |
-
author={Shopify},
|
| 286 |
-
year={2025},
|
| 287 |
-
url={https://github.com/tobi/qmd}
|
| 288 |
-
}
|
| 289 |
-
```
|
| 290 |
|
| 291 |
-
##
|
| 292 |
|
| 293 |
-
|
|
|
|
| 1 |
+
# QMD Query Expansion 1.7B (SFT)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
Updated Qwen3-1.7B model for query expansion using the production SFT pipeline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
## Prompt Format
|
| 6 |
|
| 7 |
+
The model is trained on messages formatted with the Qwen3 chat template using:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
`/no_think Expand this search query: <query>`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
## Notes
|
| 12 |
|
| 13 |
+
This checkpoint is the SFT-only trained version (GRPO is not part of the default pipeline).
|
config.json
CHANGED
|
@@ -56,7 +56,7 @@
|
|
| 56 |
},
|
| 57 |
"sliding_window": null,
|
| 58 |
"tie_word_embeddings": true,
|
| 59 |
-
"transformers_version": "5.
|
| 60 |
"use_cache": true,
|
| 61 |
"use_sliding_window": false,
|
| 62 |
"vocab_size": 151936
|
|
|
|
| 56 |
},
|
| 57 |
"sliding_window": null,
|
| 58 |
"tie_word_embeddings": true,
|
| 59 |
+
"transformers_version": "5.2.0",
|
| 60 |
"use_cache": true,
|
| 61 |
"use_sliding_window": false,
|
| 62 |
"vocab_size": 151936
|
generation_config.json
CHANGED
|
@@ -9,5 +9,5 @@
|
|
| 9 |
"temperature": 0.6,
|
| 10 |
"top_k": 20,
|
| 11 |
"top_p": 0.95,
|
| 12 |
-
"transformers_version": "5.
|
| 13 |
}
|
|
|
|
| 9 |
"temperature": 0.6,
|
| 10 |
"top_k": 20,
|
| 11 |
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "5.2.0"
|
| 13 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4063515640
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:79bb7eb18f29b8a8997c960ff0ba610e7fe5d17985bfea451192cdb034b8403b
|
| 3 |
size 4063515640
|
tokenizer_config.json
CHANGED
|
@@ -5,10 +5,25 @@
|
|
| 5 |
"clean_up_tokenization_spaces": false,
|
| 6 |
"eos_token": "<|im_end|>",
|
| 7 |
"errors": "replace",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
"is_local": false,
|
| 9 |
"model_max_length": 131072,
|
| 10 |
"pad_token": "<|endoftext|>",
|
| 11 |
"split_special_tokens": false,
|
| 12 |
"tokenizer_class": "Qwen2Tokenizer",
|
| 13 |
"unk_token": null
|
| 14 |
-
}
|
|
|
|
| 5 |
"clean_up_tokenization_spaces": false,
|
| 6 |
"eos_token": "<|im_end|>",
|
| 7 |
"errors": "replace",
|
| 8 |
+
"extra_special_tokens": [
|
| 9 |
+
"<|im_start|>",
|
| 10 |
+
"<|im_end|>",
|
| 11 |
+
"<|object_ref_start|>",
|
| 12 |
+
"<|object_ref_end|>",
|
| 13 |
+
"<|box_start|>",
|
| 14 |
+
"<|box_end|>",
|
| 15 |
+
"<|quad_start|>",
|
| 16 |
+
"<|quad_end|>",
|
| 17 |
+
"<|vision_start|>",
|
| 18 |
+
"<|vision_end|>",
|
| 19 |
+
"<|vision_pad|>",
|
| 20 |
+
"<|image_pad|>",
|
| 21 |
+
"<|video_pad|>"
|
| 22 |
+
],
|
| 23 |
"is_local": false,
|
| 24 |
"model_max_length": 131072,
|
| 25 |
"pad_token": "<|endoftext|>",
|
| 26 |
"split_special_tokens": false,
|
| 27 |
"tokenizer_class": "Qwen2Tokenizer",
|
| 28 |
"unk_token": null
|
| 29 |
+
}
|