tobil commited on
Commit
c8ff036
·
verified ·
1 Parent(s): 7aef2c5

Update qmd-query-expansion-1.7B with latest SFT weights

Browse files
Files changed (5) hide show
  1. README.md +6 -286
  2. config.json +1 -1
  3. generation_config.json +1 -1
  4. model.safetensors +1 -1
  5. tokenizer_config.json +16 -1
README.md CHANGED
@@ -1,293 +1,13 @@
1
- ---
2
- license: apache-2.0
3
- base_model: Qwen/Qwen3-1.7B
4
- tags:
5
- - query-expansion
6
- - search
7
- - retrieval
8
- - rag
9
- - hybrid-search
10
- - dspy
11
- - gepa
12
- language:
13
- - en
14
- pipeline_tag: text-generation
15
- datasets:
16
- - custom
17
- ---
18
 
19
- # QMD Query Expansion 1.7B
20
-
21
- A Qwen3-1.7B model finetuned for **query expansion** in hybrid search systems (RAG). Expands user queries into retrieval-optimized variations for both sparse (BM25) and dense (vector) search.
22
-
23
- **Repository**: [github.com/tobi/qmd](https://github.com/tobi/qmd)
24
-
25
- ## What This Model Does
26
-
27
- Given a search query, generates 7 expansions:
28
- - **1 hyde**: A hypothetical document snippet (50-200 chars) that would answer the query
29
- - **3 lex**: Keyword phrases (2-5 words) optimized for BM25/sparse search
30
- - **3 vec**: Natural language sentences (15-30 words) for vector/dense search
31
-
32
- This improves recall in hybrid retrieval systems by matching both exact keywords and semantic meaning.
33
 
34
  ## Prompt Format
35
 
36
- **Critical**: Use this exact format. The model was trained on this specific template.
37
-
38
- ```
39
- Expand this search query:
40
- <query>
41
- ```
42
-
43
- **Example Input**:
44
- ```
45
- Expand this search query:
46
- postgresql jsonb indexing
47
- ```
48
-
49
- **Example Output**:
50
- ```
51
- hyde: PostgreSQL JSONB supports GIN indexes for fast key lookups and containment queries with @> operator.
52
- lex: postgresql jsonb gin index
53
- lex: postgres json indexing strategies
54
- lex: jsonb index optimization postgresql
55
- vec: How do I create efficient GIN indexes on JSONB columns in PostgreSQL?
56
- vec: Best practices for indexing JSON data in PostgreSQL databases.
57
- vec: Performance comparison of GIN vs BTREE indexes for JSONB fields.
58
- ```
59
-
60
- ## Usage
61
-
62
- ### With vLLM (Recommended)
63
-
64
- ```bash
65
- # Start server
66
- vllm serve tobil/qmd-query-expansion-1.7B --port 8000
67
-
68
- # Query
69
- curl -s http://localhost:8000/v1/chat/completions \
70
- -H "Content-Type: application/json" \
71
- -d '{
72
- "model": "tobil/qmd-query-expansion-1.7B",
73
- "messages": [{"role": "user", "content": "Expand this search query:\npostgresql jsonb indexing"}],
74
- "temperature": 0.7,
75
- "max_tokens": 400
76
- }' | jq -r '.choices[0].message.content'
77
- ```
78
-
79
- ### With Transformers
80
-
81
- ```python
82
- from transformers import AutoTokenizer, AutoModelForCausalLM
83
-
84
- model = AutoModelForCausalLM.from_pretrained("tobil/qmd-query-expansion-1.7B")
85
- tokenizer = AutoTokenizer.from_pretrained("tobil/qmd-query-expansion-1.7B")
86
-
87
- messages = [{"role": "user", "content": "Expand this search query:\nReact hooks tutorial"}]
88
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
89
- inputs = tokenizer(text, return_tensors="pt")
90
- outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
91
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
92
- ```
93
-
94
- ### With llama.cpp (GGUF)
95
-
96
- ```bash
97
- # Download GGUF (Q8_0 quantized, 2.1GB)
98
- huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q8_0.gguf
99
-
100
- # Run
101
- ./llama-cli -m qmd-query-expansion-1.7B-Q8_0.gguf \
102
- -p "Expand this search query:\nkubernetes vs docker" \
103
- --temp 0.7 -n 400
104
- ```
105
-
106
- ## Output Parsing
107
-
108
- The model outputs in line format. Parse with:
109
-
110
- ```python
111
- import re
112
-
113
- def parse_expansions(text: str) -> list[dict]:
114
- """Parse line-based expansion output into structured format."""
115
- expansions = []
116
-
117
- # Remove thinking tags if present (Qwen3 feature)
118
- text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
119
-
120
- for line in text.strip().split('\n'):
121
- line = line.strip()
122
- match = re.match(r'^(hyde|lex|vec)\s*:\s*(.+)$', line, re.IGNORECASE)
123
- if match:
124
- expansions.append({
125
- "type": match.group(1).lower(),
126
- "value": match.group(2).strip()
127
- })
128
-
129
- return expansions
130
-
131
- # Example
132
- output = """hyde: PostgreSQL JSONB supports GIN indexes for fast queries.
133
- lex: postgresql jsonb gin index
134
- lex: postgres json indexing
135
- lex: jsonb optimization
136
- vec: How to create GIN indexes on JSONB columns?
137
- vec: Best practices for PostgreSQL JSON indexing.
138
- vec: JSONB vs JSON performance comparison."""
139
-
140
- expansions = parse_expansions(output)
141
- # [{"type": "hyde", "value": "PostgreSQL JSONB supports..."}, ...]
142
- ```
143
-
144
- ## Training Details
145
-
146
- ### Method: GEPA Distillation
147
-
148
- 1. **Teacher Model**: GPT-4o-mini with GEPA-optimized prompt
149
- 2. **Prompt Optimization**: DSPy's GEPA (Grounded Example-based Prompt Adaptation) automatically evolved the teacher prompt over 34 iterations to reach 87.7% on our scoring metric
150
- 3. **Distillation**: Generated 500+ high-quality training examples from teacher
151
- 4. **Student Training**: SFT with LoRA on Qwen3-1.7B, 3 epochs
152
-
153
- ### Key Learnings
154
-
155
- #### 1. Hyde-First Ordering Matters
156
-
157
- Generating the hypothetical document (hyde) first provides context that improves lex and vec quality. The hyde acts as an "anchor" that grounds subsequent expansions.
158
-
159
- ```
160
- ✅ Good: hyde first, then lex uses hyde context
161
- hyde: Kubernetes orchestrates containers at scale with auto-scaling...
162
- lex: kubernetes container orchestration # informed by hyde
163
-
164
- ❌ Bad: lex without context
165
- lex: container management # too generic
166
- ```
167
-
168
- #### 2. Entity Preservation is Critical
169
-
170
- Named entities (brands, products, technical terms) must appear in **every** lex expansion. Missing entities tanks BM25 recall.
171
-
172
- ```
173
- Query: "iPhone 15 vs Samsung S24"
174
-
175
- ✅ Good lex:
176
- - "iPhone 15 Samsung S24 comparison"
177
- - "iPhone 15 vs Samsung S24 specs"
178
- - "Samsung S24 iPhone 15 camera"
179
-
180
- ❌ Bad lex:
181
- - "smartphone comparison" # missing entities!
182
- - "phone camera review" # missing entities!
183
- ```
184
-
185
- #### 3. Simple Prompts Win for Small Models
186
-
187
- The teacher used a complex DSPy signature format with structured sections. But the small model performed better with the simple training format:
188
-
189
- ```
190
- ✅ Use this (matches training):
191
- "Expand this search query:\n{query}"
192
-
193
- ❌ Not this (DSPy signature format):
194
- "## Inputs\n### query\n{query}\n## Generated Outputs..."
195
- ```
196
-
197
- Complex prompts caused the small model to "leak" instruction fragments into outputs.
198
-
199
- #### 4. Line Format > JSON for Small Models
200
-
201
- Small models struggle with reliable JSON generation. Line-based format is more robust:
202
-
203
- ```
204
- ✅ Reliable:
205
- hyde: Some text here
206
- lex: keyword phrase
207
- vec: A full sentence.
208
-
209
- ❌ Unreliable for 1.7B:
210
- [{"type": "hyde", "value": "..."}, ...]
211
- ```
212
-
213
- #### 5. GEPA Prompt Evolution
214
-
215
- GEPA automatically discovered these improvements to the teacher prompt:
216
- - Explicit examples for edge cases (ambiguous queries like "pin")
217
- - Emphasis on entity preservation with concrete failure cases
218
- - Factual grounding examples (Louvre hours, GPS navigation steps)
219
- - Score targets ("aim for 78-84%") to calibrate quality
220
-
221
- ### Training Configuration
222
-
223
- ```yaml
224
- base_model: Qwen/Qwen3-1.7B
225
- method: SFT with LoRA
226
- lora_r: 64
227
- lora_alpha: 128
228
- learning_rate: 2e-4
229
- epochs: 3
230
- batch_size: 4
231
- gradient_accumulation: 4
232
- warmup_ratio: 0.1
233
- scheduler: cosine
234
- ```
235
-
236
- ### Metrics
237
-
238
- | Metric | Value |
239
- |--------|-------|
240
- | Final Loss | 0.64 |
241
- | Token Accuracy | 84.7% |
242
- | Eval Score Range | 80-96% |
243
- | Training Time | ~7 min (RTX 4090) |
244
-
245
- ## Scoring Rubric
246
-
247
- Our evaluation metric scores expansions on:
248
-
249
- 1. **Structure** (7 items: 1 hyde, 3 lex, 3 vec)
250
- 2. **Entity Preservation** (all query entities in every lex)
251
- 3. **No Verbatim Echo** (lex shouldn't just repeat the query)
252
- 4. **Hyde Quality** (50-200 chars, informative)
253
- 5. **Vec Quality** (15-30 words, semantic variation)
254
- 6. **Hyde-Lex-Vec Coherence** (lex/vec should build on hyde)
255
-
256
- ## Limitations
257
-
258
- - Trained on English queries only
259
- - May hallucinate facts in hyde (use for retrieval, not as ground truth)
260
- - Optimized for general knowledge queries; domain-specific queries may need domain-adapted models
261
- - Qwen3's `<think>` tags sometimes appear (strip them in post-processing)
262
-
263
- ## Files
264
-
265
- ### Safetensors (for transformers/vLLM)
266
- - `model.safetensors` - Full precision weights (4.1GB)
267
-
268
- ### GGUF Quantizations (for llama.cpp/Ollama)
269
-
270
- | Quant | Size | BPW | Eval Score | Use Case |
271
- |-------|------|-----|------------|----------|
272
- | Q8_0 | 2.1GB | 8.5 | 87% | Max quality |
273
- | Q6_K | 1.6GB | 6.6 | 89% | Good balance |
274
- | Q5_K_M | 1.4GB | 5.7 | 89% | Recommended |
275
- | Q4_K_M | 1.2GB | 4.8 | 92% | **Best value** |
276
- | Q4_0 | 1.2GB | 4.5 | 95% | Smallest |
277
-
278
- **Results:** All quantizations perform excellently on this structured generation task. The eval scores show minimal quality degradation even at Q4_0 - the task (generating hyde/lex/vec expansions) is simple enough that aggressive quantization doesn't hurt. **Q4_K_M is recommended** for the best size/quality tradeoff.
279
-
280
- ## Citation
281
 
282
- ```bibtex
283
- @misc{qmd-query-expansion,
284
- title={QMD Query Expansion Model},
285
- author={Shopify},
286
- year={2025},
287
- url={https://github.com/tobi/qmd}
288
- }
289
- ```
290
 
291
- ## License
292
 
293
- Apache 2.0
 
1
+ # QMD Query Expansion 1.7B (SFT)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ Updated Qwen3-1.7B model for query expansion using the production SFT pipeline.
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Prompt Format
6
 
7
+ The model is trained on messages formatted with the Qwen3 chat template using:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
+ `/no_think Expand this search query: <query>`
 
 
 
 
 
 
 
10
 
11
+ ## Notes
12
 
13
+ This checkpoint is the SFT-only trained version (GRPO is not part of the default pipeline).
config.json CHANGED
@@ -56,7 +56,7 @@
56
  },
57
  "sliding_window": null,
58
  "tie_word_embeddings": true,
59
- "transformers_version": "5.0.0",
60
  "use_cache": true,
61
  "use_sliding_window": false,
62
  "vocab_size": 151936
 
56
  },
57
  "sliding_window": null,
58
  "tie_word_embeddings": true,
59
+ "transformers_version": "5.2.0",
60
  "use_cache": true,
61
  "use_sliding_window": false,
62
  "vocab_size": 151936
generation_config.json CHANGED
@@ -9,5 +9,5 @@
9
  "temperature": 0.6,
10
  "top_k": 20,
11
  "top_p": 0.95,
12
- "transformers_version": "5.0.0"
13
  }
 
9
  "temperature": 0.6,
10
  "top_k": 20,
11
  "top_p": 0.95,
12
+ "transformers_version": "5.2.0"
13
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e520db129fa6880692fe68a22d475b222f95725691c9298e9d3246e9274a3a55
3
  size 4063515640
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79bb7eb18f29b8a8997c960ff0ba610e7fe5d17985bfea451192cdb034b8403b
3
  size 4063515640
tokenizer_config.json CHANGED
@@ -5,10 +5,25 @@
5
  "clean_up_tokenization_spaces": false,
6
  "eos_token": "<|im_end|>",
7
  "errors": "replace",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  "is_local": false,
9
  "model_max_length": 131072,
10
  "pad_token": "<|endoftext|>",
11
  "split_special_tokens": false,
12
  "tokenizer_class": "Qwen2Tokenizer",
13
  "unk_token": null
14
- }
 
5
  "clean_up_tokenization_spaces": false,
6
  "eos_token": "<|im_end|>",
7
  "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
  "is_local": false,
24
  "model_max_length": 131072,
25
  "pad_token": "<|endoftext|>",
26
  "split_special_tokens": false,
27
  "tokenizer_class": "Qwen2Tokenizer",
28
  "unk_token": null
29
+ }