File size: 6,241 Bytes
80b60c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
language:
- en

license: mit

tags:
- Explorer SubAgent
- Repository Exploration


library_name: transformers
---


## 1. Model Introduction

**FastContext-1.0** is a lightweight **repository-exploration subagent** for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues **parallel read-only tool calls** (READ, GLOB, GREP), and returns **compact file paths and line ranges** as focused context.

Repository exploration is a major bottleneck in modern coding agents β€” locating relevant code consumes a large share of the token budget and pollutes the solver's context with irrelevant snippets. In our analysis of GPT-5.4 trajectories, reading and searching account for **56.2% of all tool-use turns** and **46.5% of the main agent's total tokens**. FastContext moves this work into a dedicated subagent so the main agent receives clean, grounded evidence rather than the long trail of exploratory reads and searches.

The model family spans **4B–30B parameters**, bootstrapped from strong reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) for broad first-turn search, multi-turn evidence gathering, and precise citation generation.

- **Backbones:** Qwen3-4B-Instruct (4B explorer) and Qwen3-Coder-30B-A3B (30B explorer)
- **Variants:** `FC-4B-SFT`, `FC-4B-RL` (deployment targets), `FC-30B-SFT` (scaling reference)
- **Context length:** up to 262K tokens
- **Paper:** *FastContext: Training Efficient Repository Explorer for Coding Agents*
- **Code & data:** https://github.com/microsoft/fastcontext

### How it works

```
Coding Agent ──query──▢  FastContext  ──read/search──▢  Repository
     β–²                       β”‚
     └──── file-line β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          citations
```

Internally, FastContext runs an exploration loop:

1. **Query understanding** β€” translate the issue into search intents.
2. **Parallel tool calling** β€” issue multiple `READ` / `GLOB` / `GREP` calls in a single turn to cover complementary hypotheses.
3. **Observation-driven refinement** β€” use tool outputs to guide the next search turn.
4. **Final citations** β€” return a compact `<final_answer>` block of file paths and line ranges.


## 2. Evaluation Results

### End-to-end performance (Mini-SWE-Agent)

Integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates by **up to 5.5%** while reducing main-agent token consumption by **up to 60%**, with only marginal overhead. Scores, tokens, and turns are measured on the main-agent trajectory; deltas are relative to `w/o Explore` for the same main agent.

| Main Agent | Subagent | SWE-bench Multilingual | SWE-bench Pro | SWE-QA |
|---|---|---|---|---|
| **GPT-5.4** | w/o Explore | 71.7 / 457k | 46.0 / 818k | 81.3 / 418k |
| | FC-30B-SFT | **75.0** (↑3.3) / 356k (↓22.1%) | 49.0 (↑3.0) / 688k (↓15.9%) | **82.0** (↑0.7) / 206k (↓50.7%) |
| | FC-4B-SFT | 73.3 (↑1.6) / 364k (↓20.4%) | 47.0 (↑1.0) / 689k (↓15.8%) | 81.9 (↑0.6) / 213k (↓49.0%) |
| | FC-4B-RL | 74.7 (↑3.0) / 338k (↓26.0%) | 48.5 (↑2.5) / 701k (↓14.3%) | **82.0** (↑0.7) / 210k (↓49.8%) |
| **GLM-5.1** | w/o Explore | 72.3 / 2514k | 17.5 / 2692k | 72.7 / 401k |
| | FC-30B-SFT | 73.7 (↑1.4) / 1797k (↓28.5%) | 20.0 (↑2.5) / 2370k (↓12.0%) | 73.3 (↑0.6) / 292k (↓27.2%) |
| | FC-4B-SFT | 73.3 (↑1.0) / 1919k (↓23.7%) | 18.0 (↑0.5) / 2279k (↓15.3%) | 73.4 (↑0.7) / 306k (↓23.7%) |
| | FC-4B-RL | 73.7 (↑1.4) / 1971k (↓21.6%) | **22.5** (↑5.0) / 2210k (↓17.9%) | 73.5 (↑0.8) / 302k (↓24.7%) |
| **Kimi-K2.6** | w/o Explore | 76.3 / 1553k | 31.0 / 2383k | 71.6 / 510k |
| | FC-30B-SFT | 76.7 (↑0.4) / 1360k (↓12.4%) | 33.0 (↑2.0) / 2150k (↓9.8%) | 72.8 (↑1.2) / 373k (↓26.9%) |
| | FC-4B-SFT | 75.3 (↓1.0) / 1306k (↓15.9%) | 32.5 (↑1.5) / 2159k (↓9.4%) | 72.6 (↑1.0) / 402k (↓21.2%) |
| | FC-4B-RL | **78.3** (↑2.0) / 1384k (↓10.9%) | **33.5** (↑2.5) / 2158k (↓9.4%) | 72.6 (↑1.0) / 378k (↓25.9%) |

*Score / Tokens shown per cell. Best result per main-agent block in bold.*

**Highlights:**
- FastContext improves end-to-end accuracy for **every main agent and benchmark**; the largest gains appear on SWE-bench Pro (e.g. GPT-5.4 +5.5, GLM-5.1 +5.0).
- The biggest token savings reach **60.3%** (GPT-5.4 on SWE-QA).
- The compact **4B-RL** explorer can outperform the larger **30B-SFT** explorer β€” e.g. on GLM-5.1 SWE-bench Pro it reaches 22.5 vs. 20.0 while using fewer tokens.


## 3. Quick Start

Launch the model with an OpenAI-compatible server (e.g. SGLang). The example below serves the 4B explorer:

```bash
python3 -m sglang.launch_server \
    --model-path FastContext-1.0-4B-SFT \
    --tool-call-parser qwen \
    --context-length 262144 \
    --trust-remote-code \
    --dtype bfloat16 \
    --host 0.0.0.0 \
    --port 30000 \
    --tp-size 1 \
    --mem-fraction-static 0.8
```

FastContext exposes only three read-only tools to the model:

| Tool | Purpose |
|---|---|
| `READ` | Return line-numbered file contents |
| `GLOB` | Path discovery by glob pattern |
| `GREP` | Regex search over repository text (ripgrep-style) |

At each turn the explorer either issues one or more (parallel) tool calls or stops with a final `<final_answer>` evidence list. Wire FastContext into a coding agent (e.g. Mini-SWE-Agent) as an exploration subagent the main agent can invoke on demand.

## 4. Training Recipe

FastContext is trained in two stages:

- **Supervised fine-tuning (SFT):** The exploration traces, split into three sources matching the runtime behavior of the subagent β€” `parallel_toolcalls` (broad first-turn search), `multiturn_traj` (multi-turn evidence gathering), and `linerange` (precise citation generation).
- **Reinforcement learning (RL):** The model is rolled out as the actual subagent and optimized with **GRPO** using a deterministic reward combining file- and line-level F1, a bonus for bounded parallel exploration, and format penalties.

## License

This project is licensed under the MIT License.