---
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
  - query-expansion
  - search
  - retrieval
  - rag
  - hybrid-search
  - dspy
  - gepa
language:
  - en
pipeline_tag: text-generation
datasets:
  - custom
---

# QMD Query Expansion 1.7B

A Qwen3-1.7B model finetuned for **query expansion** in hybrid search systems (RAG). Expands user queries into retrieval-optimized variations for both sparse (BM25) and dense (vector) search.

**Repository**: [github.com/Shopify/qmd](https://github.com/Shopify/qmd)

## What This Model Does

Given a search query, generates 7 expansions:
- **1 hyde**: A hypothetical document snippet (50-200 chars) that would answer the query
- **3 lex**: Keyword phrases (2-5 words) optimized for BM25/sparse search
- **3 vec**: Natural language sentences (15-30 words) for vector/dense search

This improves recall in hybrid retrieval systems by matching both exact keywords and semantic meaning.

## Prompt Format

**Critical**: Use this exact format. The model was trained on this specific template.

```
Expand this search query:
<query>
```

**Example Input**:
```
Expand this search query:
postgresql jsonb indexing
```

**Example Output**:
```
hyde: PostgreSQL JSONB supports GIN indexes for fast key lookups and containment queries with @> operator.
lex: postgresql jsonb gin index
lex: postgres json indexing strategies
lex: jsonb index optimization postgresql
vec: How do I create efficient GIN indexes on JSONB columns in PostgreSQL?
vec: Best practices for indexing JSON data in PostgreSQL databases.
vec: Performance comparison of GIN vs BTREE indexes for JSONB fields.
```

## Usage

### With vLLM (Recommended)

```bash
# Start server
vllm serve tobil/qmd-query-expansion-1.7B --port 8000

# Query
curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tobil/qmd-query-expansion-1.7B",
    "messages": [{"role": "user", "content": "Expand this search query:\npostgresql jsonb indexing"}],
    "temperature": 0.7,
    "max_tokens": 400
  }' | jq -r '.choices[0].message.content'
```

### With Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("tobil/qmd-query-expansion-1.7B")
tokenizer = AutoTokenizer.from_pretrained("tobil/qmd-query-expansion-1.7B")

messages = [{"role": "user", "content": "Expand this search query:\nReact hooks tutorial"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### With llama.cpp (GGUF)

```bash
# Download GGUF (Q8_0 quantized, 2.1GB)
huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q8_0.gguf

# Run
./llama-cli -m qmd-query-expansion-1.7B-Q8_0.gguf \
  -p "Expand this search query:\nkubernetes vs docker" \
  --temp 0.7 -n 400
```

## Output Parsing

The model outputs in line format. Parse with:

```python
import re

def parse_expansions(text: str) -> list[dict]:
    """Parse line-based expansion output into structured format."""
    expansions = []
    
    # Remove thinking tags if present (Qwen3 feature)
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    
    for line in text.strip().split('\n'):
        line = line.strip()
        match = re.match(r'^(hyde|lex|vec)\s*:\s*(.+)$', line, re.IGNORECASE)
        if match:
            expansions.append({
                "type": match.group(1).lower(),
                "value": match.group(2).strip()
            })
    
    return expansions

# Example
output = """hyde: PostgreSQL JSONB supports GIN indexes for fast queries.
lex: postgresql jsonb gin index
lex: postgres json indexing
lex: jsonb optimization
vec: How to create GIN indexes on JSONB columns?
vec: Best practices for PostgreSQL JSON indexing.
vec: JSONB vs JSON performance comparison."""

expansions = parse_expansions(output)
# [{"type": "hyde", "value": "PostgreSQL JSONB supports..."}, ...]
```

## Training Details

### Method: GEPA Distillation

1. **Teacher Model**: GPT-4o-mini with GEPA-optimized prompt
2. **Prompt Optimization**: DSPy's GEPA (Grounded Example-based Prompt Adaptation) automatically evolved the teacher prompt over 34 iterations to reach 87.7% on our scoring metric
3. **Distillation**: Generated 500+ high-quality training examples from teacher
4. **Student Training**: SFT with LoRA on Qwen3-1.7B, 3 epochs

### Key Learnings

#### 1. Hyde-First Ordering Matters

Generating the hypothetical document (hyde) first provides context that improves lex and vec quality. The hyde acts as an "anchor" that grounds subsequent expansions.

```
✅ Good: hyde first, then lex uses hyde context
hyde: Kubernetes orchestrates containers at scale with auto-scaling...
lex: kubernetes container orchestration  # informed by hyde

❌ Bad: lex without context
lex: container management  # too generic
```

#### 2. Entity Preservation is Critical

Named entities (brands, products, technical terms) must appear in **every** lex expansion. Missing entities tanks BM25 recall.

```
Query: "iPhone 15 vs Samsung S24"

✅ Good lex:
- "iPhone 15 Samsung S24 comparison"
- "iPhone 15 vs Samsung S24 specs"  
- "Samsung S24 iPhone 15 camera"

❌ Bad lex:
- "smartphone comparison"  # missing entities!
- "phone camera review"    # missing entities!
```

#### 3. Simple Prompts Win for Small Models

The teacher used a complex DSPy signature format with structured sections. But the small model performed better with the simple training format:

```
✅ Use this (matches training):
"Expand this search query:\n{query}"

❌ Not this (DSPy signature format):
"## Inputs\n### query\n{query}\n## Generated Outputs..."
```

Complex prompts caused the small model to "leak" instruction fragments into outputs.

#### 4. Line Format > JSON for Small Models

Small models struggle with reliable JSON generation. Line-based format is more robust:

```
✅ Reliable:
hyde: Some text here
lex: keyword phrase
vec: A full sentence.

❌ Unreliable for 1.7B:
[{"type": "hyde", "value": "..."}, ...]
```

#### 5. GEPA Prompt Evolution

GEPA automatically discovered these improvements to the teacher prompt:
- Explicit examples for edge cases (ambiguous queries like "pin")
- Emphasis on entity preservation with concrete failure cases
- Factual grounding examples (Louvre hours, GPS navigation steps)
- Score targets ("aim for 78-84%") to calibrate quality

### Training Configuration

```yaml
base_model: Qwen/Qwen3-1.7B
method: SFT with LoRA
lora_r: 64
lora_alpha: 128
learning_rate: 2e-4
epochs: 3
batch_size: 4
gradient_accumulation: 4
warmup_ratio: 0.1
scheduler: cosine
```

### Metrics

| Metric | Value |
|--------|-------|
| Final Loss | 0.64 |
| Token Accuracy | 84.7% |
| Eval Score Range | 80-96% |
| Training Time | ~7 min (RTX 4090) |

## Scoring Rubric

Our evaluation metric scores expansions on:

1. **Structure** (7 items: 1 hyde, 3 lex, 3 vec)
2. **Entity Preservation** (all query entities in every lex)
3. **No Verbatim Echo** (lex shouldn't just repeat the query)
4. **Hyde Quality** (50-200 chars, informative)
5. **Vec Quality** (15-30 words, semantic variation)
6. **Hyde-Lex-Vec Coherence** (lex/vec should build on hyde)

## Limitations

- Trained on English queries only
- May hallucinate facts in hyde (use for retrieval, not as ground truth)
- Optimized for general knowledge queries; domain-specific queries may need domain-adapted models
- Qwen3's `<think>` tags sometimes appear (strip them in post-processing)

## Files

- `model.safetensors` - Model weights (4.1GB)
- `qmd-query-expansion-1.7B-Q8_0.gguf` - GGUF format for llama.cpp (2.1GB, Q8_0 quantized)
- `tokenizer.json` - Tokenizer

## Citation

```bibtex
@misc{qmd-query-expansion,
  title={QMD Query Expansion Model},
  author={Shopify},
  year={2025},
  url={https://github.com/Shopify/qmd}
}
```

## License

Apache 2.0