Context Sphere Projector

This repository contains the Context Projection Model v3 checkpoint used by the Context Sphere artifact.

The Projector is a persona-conditioned routing model. It operates after the Master Context Sphere is assembled and scores candidate context nodes separately for the Product Manager, Worker, and Reviewer personas. The goal is to reduce token load while preserving enough structural evidence for repair.

Files

model.safetensors: trained projection model weights.
config.json: model architecture configuration.
tokenizer.json, tokenizer_config.json, special_tokens_map.json, vocab.txt: tokenizer assets.
best_worker_margin.json: selected checkpoint metadata.
context_projector_v3_training_report.json: training report.
context_projector_v3_persona_thresholds.json: calibrated persona threshold report.

Training Summary

The projection model was trained from a cross-encoder/ms-marco-MiniLM-L-6-v2 backbone on 7,299 persona-conditioned samples with an 888-row validation split. Training used persona-stratified oversampling and asymmetric BCE loss with positive weights PM=8, REVIEWER=10, and WORKER=18. The final checkpoint was selected at epoch 1 using the Worker Margin criterion.

In the paper's 10-case projection smoke test, the min_k=2 safety-floor configuration preserved 9/10 known Context Sphere successes while reducing input tokens by 71.5% and estimated inference cost by 58.4%.

Usage

The companion artifact repository contains the Context Sphere inference code, projection integration, reproduction scripts, and evaluation artifacts:

https://github.com/johnZYW/context-sphere

Download this model into the default projection path used by scripts/orchestrate_resolution.py:

python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Zywdd/context-sphere-projector",
    repo_type="model",
    local_dir="models/context_projector_v3",
    allow_patterns=[
        "model.safetensors",
        "config.json",
        "tokenizer.json",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "vocab.txt",
        "best_worker_margin.json",
        "context_projector_v3_training_report.json",
        "context_projector_v3_persona_thresholds.json",
    ],
)
PY

The Context Sphere pipeline loads the projector through sentence_transformers.CrossEncoder:

from sentence_transformers import CrossEncoder

model = CrossEncoder("models/context_projector_v3", device="cpu")
scores = model.predict([
    ["Persona: WORKER | Task: fix the issue", "candidate file text"]
])

In the full artifact, projection mode is enabled with:

python scripts/run_benchmarks.py \
  --cases-file artifacts/cases/projection_smoke_context_passed_10.json \
  --retrieval-mode projection \
  --projection-min-k 2 \
  --model-strategy fallback \
  --max-file-chars 60000 \
  --out outputs/projection_smoke_10_floor_repro \
  --run-verify

Citation

@misc{zhang2026contextsphere,
  title        = {Context Sphere: Topology-Aware Context Orchestration for Cost-Efficient LLM Repository Repair},
  author       = {Zhang, Yuwen},
  year         = {2026},
  howpublished = {arXiv preprint and artifact release}
}

Downloads last month: 1

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for Zywdd/context-sphere-projector

Base model

microsoft/MiniLM-L12-H384-uncased

Quantized

cross-encoder/ms-marco-MiniLM-L12-v2

Quantized

cross-encoder/ms-marco-MiniLM-L6-v2

Finetuned

(64)

this model