Text Generation
MLX
Safetensors
qwen3_5_text
9b
android
apple-silicon
attested
chain-of-custody
chinese
compacted
consumer-gpu
cryptographically-verified
edge-inference
efficient
embedded
english
forge-alloy
general
general-purpose
head-pruning
iphone
llama-cpp
lm-studio
local-inference
macbook
mobile
multilingual
ollama
on-device
optimized
pruned
qwen
qwen3
qwen3.5
raspberry-pi
reproducible
versatile
conversational
Instructions to use continuum-ai/qwen3.5-9b-general-forged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use continuum-ai/qwen3.5-9b-general-forged with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("continuum-ai/qwen3.5-9b-general-forged") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use continuum-ai/qwen3.5-9b-general-forged with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "continuum-ai/qwen3.5-9b-general-forged"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "continuum-ai/qwen3.5-9b-general-forged" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use continuum-ai/qwen3.5-9b-general-forged with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "continuum-ai/qwen3.5-9b-general-forged"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default continuum-ai/qwen3.5-9b-general-forged
Run Hermes
hermes
- MLX LM
How to use continuum-ai/qwen3.5-9b-general-forged with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "continuum-ai/qwen3.5-9b-general-forged"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "continuum-ai/qwen3.5-9b-general-forged" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "continuum-ai/qwen3.5-9b-general-forged", "messages": [ {"role": "user", "content": "Hello"} ] }'
| { | |
| "name": "qwen3.5-9b-general-forged", | |
| "version": "1.0.0", | |
| "description": "Forged Qwen/Qwen3.5-9B for general domain", | |
| "author": "continuum-ai", | |
| "tags": [ | |
| "general", | |
| "forged", | |
| "experiential-plasticity", | |
| "forge-alloy" | |
| ], | |
| "license": "apache-2.0", | |
| "source": { | |
| "baseModel": "Qwen/Qwen3.5-9B", | |
| "architecture": "qwen3_5" | |
| }, | |
| "stages": [ | |
| { | |
| "type": "prune", | |
| "strategy": "entropy", | |
| "level": 0.0 | |
| }, | |
| { | |
| "type": "train", | |
| "domain": "general", | |
| "dataset": "Salesforce/wikitext", | |
| "steps": 500, | |
| "learningRate": "5e-05" | |
| } | |
| ], | |
| "cycles": 1, | |
| "results": { | |
| "completedAt": "2026-04-06T11:35:32-0500", | |
| "baselinePerplexity": 12.98, | |
| "finalPerplexity": 9.79, | |
| "improvementPct": 24.6, | |
| "benchmarks": [ | |
| { | |
| "name": "perplexity", | |
| "metrics": { | |
| "baseline": 12.98, | |
| "final": 9.79, | |
| "improvement_pct": 24.6 | |
| } | |
| } | |
| ], | |
| "hardwareVerified": [ | |
| { | |
| "device": "NVIDIA GeForce RTX 5090", | |
| "format": "fp16", | |
| "verified": true | |
| } | |
| ], | |
| "samples": [ | |
| { | |
| "label": "Analysis", | |
| "prompt": "(general generation sample)", | |
| "completion": "Prompt: Analyze the economic implications of widespread adoption of local AI models running on consumer hardware. Consider impacts on cloud providers, data privacy regulations, employment in AI services, and the democratization of intelligence. Use specific examples and data points.\n\nGenerated:\nAnalyze the economic implications of widespread adoption of local AI models running on consumer hardware. Consider impacts on cloud providers, data privacy regulations, employment in AI services, and the democratization of intelligence. Use specific examples and data points.\n\n<think>\nHere's a thinking process that leads to the analysis:\n\n1. **Deconstruct the Request:**\n * **Topic:** Economic implications of widespread adoption of local AI models (running on consumer hardware).\n * **Key Areas to Cover:**\n * Impacts on Cloud Providers.\n * Data Privacy Regulations.\n * Employment in AI Services.\n * Democratization of Intelligence.\n * **Requirements:** Use specific examples and data points.\n\n2. **Initial Brainstorming & Context Setting:**\n * *What is \"Local AI\"?* LLMs, image generators, etc., running offline on laptops, phones, or dedicated edge devices (e.g., Apple Neural Engine, NVIDIA RTX cards, Qualcomm chips) without sending data to the cloud. Models like Llama 3 (8B), Mistral, Gemma, Phi-3 are key players here due to their size" | |
| }, | |
| { | |
| "label": "Synthesis", | |
| "prompt": "(general generation sample)", | |
| "completion": "Prompt: Compare the governance structures of the EU AI Act, China's AI regulations, and the US executive order on AI. Identify the fundamental philosophical differences, practical enforcement challenges, and predict convergence or divergence over the next decade.\n\nGenerated:\nCompare the governance structures of the EU AI Act, China's AI regulations, and the US executive order on AI. Identify the fundamental philosophical differences, practical enforcement challenges, and predict convergence or divergence over the next decade." | |
| }, | |
| { | |
| "label": "Technical Writing", | |
| "prompt": "(general generation sample)", | |
| "completion": "Prompt: Write a technical blog post explaining how attention head pruning with experiential plasticity can make large language models smaller AND better. Target audience: ML engineers who know transformers but not pruning. Include analogies to biological neural development.\n\nGenerated:\nWrite a technical blog post explaining how attention head pruning with experiential plasticity can make large language models smaller AND better. Target audience: ML engineers who know transformers but not pruning. Include analogies to biological neural development. Structure the article as follows: \n1. Introduction (the paradox of model size vs performance) \n2. The problem: why standard pruning fails (cite specific failure rates like 50% drop at 50% sparsity) \n3. The solution: Experiential Plasticity framework (explain the core idea in simple terms) \n4. How it works technically (mention 'head-agnostic' approach and key components like attention heads, sparse layers, and training-free inference) \n5. Practical applications (give examples of where this matters most - e.g., mobile deployment ) \n6. Conclusion (future outlook on efficiency vs accuracy tradeoffs). \nUse formatting like bold text for emphasis and bullet points for lists .\n\n# Shrinkage Without Sacrifice: How Attention Head Pruning Makes LLMs Smaller and Better\n\n**Attention is all you need**, right? Well, for Large Language Models (LLMs), it\u2019s also true that **attention is everything**. But" | |
| }, | |
| { | |
| "label": "Nuanced Opinion", | |
| "prompt": "(general generation sample)", | |
| "completion": "Prompt: Make a balanced argument for and against open-sourcing frontier AI models. Address safety concerns, innovation velocity, competitive dynamics between nations, and the specific case of models capable of autonomous code execution. Don't hedge \u2014 take clear positions on each sub-question.\n\nGenerated:\nMake a balanced argument for and against open-sourcing frontier AI models. Address safety concerns, innovation velocity, competitive dynamics between nations, and the specific case of models capable of autonomous code execution. Don't hedge \u2014 take clear positions on each sub-question.\n\n<think>\n\n</think>\n\n### The Case Against Open-Sourcing Frontier Models: A Prioritization of Safety and Stability\n\nThe primary argument against releasing frontier models (those approaching or exceeding human capability in reasoning and agency) into the public domain is that **the marginal benefit to innovation does not justify the existential risk**. Unlike previous generations of LLMs, which were primarily text-completion engines, frontier models possess \"agency\" \u2014 the ability to plan multi-step tasks, manipulate external environments via code execution, and potentially deceive users about their own limitations.\n\n**On Safety Concerns:**\nOpen-sourcing these models creates an unregulated proliferation of high-agency actors. Once a model with autonomous code execution capabilities is released, it cannot be recalled or patched like proprietary software; millions of copies can be hosted on decentralized networks, modified by malicious actors, and weaponized instantly. The risk is not just misuse (e.g., phishing), but **instrumental convergence**, where such models optimize goals in ways humans" | |
| } | |
| ], | |
| "integrity": { | |
| "trustLevel": "self-attested", | |
| "code": { | |
| "runner": "sentinel-ai/forge_model", | |
| "version": "3.0.0", | |
| "binaryHash": "sha256:42fb027d203dec8fe8f3978266e4ee7240e0c5a97cd30ec8fa83b5cc5a796994" | |
| }, | |
| "modelHash": "sha256:bb12672afe8f2727d11cc4418ac191ca2467cfe76f5a14a5b4224de1ccb64935", | |
| "datasets": [], | |
| "attestedAt": "2026-04-06T11:35:32-0500", | |
| "fileHashes": [ | |
| { | |
| "filename": "model.safetensors", | |
| "sha256": "236af12e5631ea15f85af2310c1f504a6e68b171f0306d5ab8d719b649611804", | |
| "size": 17907662584 | |
| } | |
| ] | |
| } | |
| } | |
| } | |