How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-1.7b",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

prism-coder:1b7 β€” 17-Tool Memory Agent (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for full Prism Memory tool routing in the Prism Coder system. Primary deployment: any device via llama.cpp GGUF β€” the ultra-lightweight tier.

eval_300 Benchmark β€” swe43 (Current)

300/300 Γ— 3 shuffled runs = 100.0%, 0 flaky

Category Count Description Accuracy
natural_phrasing 50 Natural language β†’ correct tool 100%
adversarial_trap 70 Coding/CS questions β†’ plain text (no tool) 100%
disambiguation 40 Ambiguous session vs knowledge ops 100%
edge_case 25 Self-description, capability queries β†’ plain text 100%
verifier 25 Verify-then-act chains 100%
param_extraction 25 Extract project/query from prompt 100%
cascade 25 Multi-step tool chains 100%
multi_intent 20 Compound instructions 100%
abstention 20 Greetings, math, creative requests β†’ plain text 100%

300 test cases, 3 shuffled runs, temperature=0, 0 hallucinations across all runs.

Tools

Routes to 17 Prism Memory tools + knows when NOT to call any tool:

Tool Trigger
session_load_context Load/resume project context, "starting fresh"
session_save_ledger Log/record completed work
session_save_handoff Create handoff note for next session
session_search_memory Recall prior discussions
session_forget_memory Delete a memory entry
session_health_check Check session system health
session_compact_ledger Compact/prune session ledger
session_export_memory Export session data
session_task_route Route task: local vs cloud
session_save_experience Save a notable experience
session_synthesize_edges Build session graph edges
session_backfill_links Repair dangling session links
knowledge_search Search stored knowledge base
knowledge_forget Remove a knowledge entry
knowledge_upvote Upvote knowledge entry
knowledge_downvote Downvote knowledge entry
knowledge_set_retention Set retention policy

Abstains (plain text) for: coding questions, CS concepts, arithmetic, greetings, capability queries, creative requests, general knowledge.

Version History

Version eval_300 Notes
swe43 300/300 Γ— 3 runs = 100.0% Fresh rank=32 LoRA + <think> routing, Q8_0 GGUF
swe30 280/300 = 93.3% Q8_0 first round (fixed Q4KM quantization erasure)
v43l 203/300 = 67.7% Baseline before SWE training
v42 100% BFCL 6-tool Previous 6-tool routing model

Key Training Insights

  • Q8_0 quantization required β€” Q4KM erased LoRA deltas for soft abstain patterns (87%β†’93% at R30)
  • Adapter saturation β€” After 39 cumulative rounds at rank=8, adapter was saturated. Fresh rank=32 on R39-merged base broke plateau in one round (93.3%β†’99.7%)
  • <think> routing blocks β€” Added CoT reasoning to abstain examples activates Qwen3's pretrained thinking circuit, providing explicit gradient path for the routing decision

Model Details

  • Base: Qwen/Qwen3-1.7B β†’ merged through 43 SWE training rounds
  • Format: GGUF Q8_0 (2.2 GB)
  • Context: 8,192 tokens
  • Final adapter: MLX LoRA rank=32, all 28 layers, LR=3e-6β†’8e-7, 1,267 train rows/round
  • Total training: 43 rounds of cumulative SFT + 4 fresh rank=32 rounds

Usage

ollama pull dcostenco/prism-coder:1b7
ollama run dcostenco/prism-coder:1b7

Or via the Synalux Prism MCP server which routes tool calls automatically.

Downloads last month
2,234
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dcostenco/prism-coder-1.7b

Finetuned
Qwen/Qwen3-1.7B
Quantized
(276)
this model