Text Generation
Transformers
Safetensors
English
qwen3
Explorer SubAgent
Repository Exploration
conversational
text-generation-inference
Instructions to use microsoft/FastContext-1.0-4B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/FastContext-1.0-4B-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/FastContext-1.0-4B-SFT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/FastContext-1.0-4B-SFT") model = AutoModelForMultimodalLM.from_pretrained("microsoft/FastContext-1.0-4B-SFT") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/FastContext-1.0-4B-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/FastContext-1.0-4B-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/FastContext-1.0-4B-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/FastContext-1.0-4B-SFT
- SGLang
How to use microsoft/FastContext-1.0-4B-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/FastContext-1.0-4B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/FastContext-1.0-4B-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/FastContext-1.0-4B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/FastContext-1.0-4B-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/FastContext-1.0-4B-SFT with Docker Model Runner:
docker model run hf.co/microsoft/FastContext-1.0-4B-SFT
File size: 6,241 Bytes
80b60c0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | ---
language:
- en
license: mit
tags:
- Explorer SubAgent
- Repository Exploration
library_name: transformers
---
## 1. Model Introduction
**FastContext-1.0** is a lightweight **repository-exploration subagent** for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues **parallel read-only tool calls** (READ, GLOB, GREP), and returns **compact file paths and line ranges** as focused context.
Repository exploration is a major bottleneck in modern coding agents β locating relevant code consumes a large share of the token budget and pollutes the solver's context with irrelevant snippets. In our analysis of GPT-5.4 trajectories, reading and searching account for **56.2% of all tool-use turns** and **46.5% of the main agent's total tokens**. FastContext moves this work into a dedicated subagent so the main agent receives clean, grounded evidence rather than the long trail of exploratory reads and searches.
The model family spans **4Bβ30B parameters**, bootstrapped from strong reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) for broad first-turn search, multi-turn evidence gathering, and precise citation generation.
- **Backbones:** Qwen3-4B-Instruct (4B explorer) and Qwen3-Coder-30B-A3B (30B explorer)
- **Variants:** `FC-4B-SFT`, `FC-4B-RL` (deployment targets), `FC-30B-SFT` (scaling reference)
- **Context length:** up to 262K tokens
- **Paper:** *FastContext: Training Efficient Repository Explorer for Coding Agents*
- **Code & data:** https://github.com/microsoft/fastcontext
### How it works
```
Coding Agent ββqueryβββΆ FastContext ββread/searchβββΆ Repository
β² β
βββββ file-line βββββββββ
citations
```
Internally, FastContext runs an exploration loop:
1. **Query understanding** β translate the issue into search intents.
2. **Parallel tool calling** β issue multiple `READ` / `GLOB` / `GREP` calls in a single turn to cover complementary hypotheses.
3. **Observation-driven refinement** β use tool outputs to guide the next search turn.
4. **Final citations** β return a compact `<final_answer>` block of file paths and line ranges.
## 2. Evaluation Results
### End-to-end performance (Mini-SWE-Agent)
Integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates by **up to 5.5%** while reducing main-agent token consumption by **up to 60%**, with only marginal overhead. Scores, tokens, and turns are measured on the main-agent trajectory; deltas are relative to `w/o Explore` for the same main agent.
| Main Agent | Subagent | SWE-bench Multilingual | SWE-bench Pro | SWE-QA |
|---|---|---|---|---|
| **GPT-5.4** | w/o Explore | 71.7 / 457k | 46.0 / 818k | 81.3 / 418k |
| | FC-30B-SFT | **75.0** (β3.3) / 356k (β22.1%) | 49.0 (β3.0) / 688k (β15.9%) | **82.0** (β0.7) / 206k (β50.7%) |
| | FC-4B-SFT | 73.3 (β1.6) / 364k (β20.4%) | 47.0 (β1.0) / 689k (β15.8%) | 81.9 (β0.6) / 213k (β49.0%) |
| | FC-4B-RL | 74.7 (β3.0) / 338k (β26.0%) | 48.5 (β2.5) / 701k (β14.3%) | **82.0** (β0.7) / 210k (β49.8%) |
| **GLM-5.1** | w/o Explore | 72.3 / 2514k | 17.5 / 2692k | 72.7 / 401k |
| | FC-30B-SFT | 73.7 (β1.4) / 1797k (β28.5%) | 20.0 (β2.5) / 2370k (β12.0%) | 73.3 (β0.6) / 292k (β27.2%) |
| | FC-4B-SFT | 73.3 (β1.0) / 1919k (β23.7%) | 18.0 (β0.5) / 2279k (β15.3%) | 73.4 (β0.7) / 306k (β23.7%) |
| | FC-4B-RL | 73.7 (β1.4) / 1971k (β21.6%) | **22.5** (β5.0) / 2210k (β17.9%) | 73.5 (β0.8) / 302k (β24.7%) |
| **Kimi-K2.6** | w/o Explore | 76.3 / 1553k | 31.0 / 2383k | 71.6 / 510k |
| | FC-30B-SFT | 76.7 (β0.4) / 1360k (β12.4%) | 33.0 (β2.0) / 2150k (β9.8%) | 72.8 (β1.2) / 373k (β26.9%) |
| | FC-4B-SFT | 75.3 (β1.0) / 1306k (β15.9%) | 32.5 (β1.5) / 2159k (β9.4%) | 72.6 (β1.0) / 402k (β21.2%) |
| | FC-4B-RL | **78.3** (β2.0) / 1384k (β10.9%) | **33.5** (β2.5) / 2158k (β9.4%) | 72.6 (β1.0) / 378k (β25.9%) |
*Score / Tokens shown per cell. Best result per main-agent block in bold.*
**Highlights:**
- FastContext improves end-to-end accuracy for **every main agent and benchmark**; the largest gains appear on SWE-bench Pro (e.g. GPT-5.4 +5.5, GLM-5.1 +5.0).
- The biggest token savings reach **60.3%** (GPT-5.4 on SWE-QA).
- The compact **4B-RL** explorer can outperform the larger **30B-SFT** explorer β e.g. on GLM-5.1 SWE-bench Pro it reaches 22.5 vs. 20.0 while using fewer tokens.
## 3. Quick Start
Launch the model with an OpenAI-compatible server (e.g. SGLang). The example below serves the 4B explorer:
```bash
python3 -m sglang.launch_server \
--model-path FastContext-1.0-4B-SFT \
--tool-call-parser qwen \
--context-length 262144 \
--trust-remote-code \
--dtype bfloat16 \
--host 0.0.0.0 \
--port 30000 \
--tp-size 1 \
--mem-fraction-static 0.8
```
FastContext exposes only three read-only tools to the model:
| Tool | Purpose |
|---|---|
| `READ` | Return line-numbered file contents |
| `GLOB` | Path discovery by glob pattern |
| `GREP` | Regex search over repository text (ripgrep-style) |
At each turn the explorer either issues one or more (parallel) tool calls or stops with a final `<final_answer>` evidence list. Wire FastContext into a coding agent (e.g. Mini-SWE-Agent) as an exploration subagent the main agent can invoke on demand.
## 4. Training Recipe
FastContext is trained in two stages:
- **Supervised fine-tuning (SFT):** The exploration traces, split into three sources matching the runtime behavior of the subagent β `parallel_toolcalls` (broad first-turn search), `multiturn_traj` (multi-turn evidence gathering), and `linerange` (precise citation generation).
- **Reinforcement learning (RL):** The model is rolled out as the actual subagent and optimized with **GRPO** using a deterministic reward combining file- and line-level F1, a bonus for bounded parallel exploration, and format penalties.
## License
This project is licensed under the MIT License.
|