Instructions to use nightmedia/Qwen3.6-27B-qx86-hi-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nightmedia/Qwen3.6-27B-qx86-hi-mlx with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("nightmedia/Qwen3.6-27B-qx86-hi-mlx") config = load_config("nightmedia/Qwen3.6-27B-qx86-hi-mlx") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use nightmedia/Qwen3.6-27B-qx86-hi-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3.6-27B-qx86-hi-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nightmedia/Qwen3.6-27B-qx86-hi-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nightmedia/Qwen3.6-27B-qx86-hi-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3.6-27B-qx86-hi-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nightmedia/Qwen3.6-27B-qx86-hi-mlx
Run Hermes
hermes
Qwen3.6-27B-qx86-hi-mlx
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.647,0.803,0.910,0.773,0.450,0.806,0.742
qx86-hi 0.637,0.798,0.911,0.775,0.442,0.807,0.737
Quant Perplexity Peak Memory Tokens/sec
mxfp8 5.265 ± 0.041 34.74 GB 212
mxfp4 5.108 ± 0.039 21.30 GB 223
will provide soon more baseline numbers. -G
Based on this model
arc arc/e boolq hswag obkqa piqa wino
nightmedia/Qwen3.6-27B-Claude-4.6-OS-BF16
bf16 0.683,0.858,0.910,0.797,0.494,0.820,0.755
mxfp8 0.695,0.869,0.910,0.791,0.504,0.824,0.760
qx64-hi 0.688,0.859,0.903
nightmedia/Qwen3.6-27B-Claude-Deckard
mxfp8 0.690,0.863,0.910
qx64-hi 0.685,0.851,0.903
nightmedia/Qwen3.6-27B-Architect-DS9-1M-bf16
bf16 0.678,0.852,0.911
mxfp8 0.690,0.867,0.909
qx86-hi 0.663,0.832,0.911
qx64-hi 0.685,0.855,0.903
mxfp4 0.679,0.858,0.911
nightmedia/Qwen3.6-27B-Deckard-Claude-DS9
mxfp8 0.672,0.845,0.909
qx64-hi 0.685,0.851,0.903
nightmedia/Qwen3.6-27B-Qwopus-GLM-Instruct
qx86-hi 0.656,0.826,0.910,0.776,0.474,0.812,0.739
qx64-hi 0.662,0.827,0.904
Heretic uncensored models
DavidAU/Qwen3.6-27B-uncensored-heretic-v2
mxfp8 0.644,0.788,0.902
DavidAU/Qwen3.6-27B-Heretic-Uncensored-Finetune
mxfp8 0.669,0.835,0.906
nightmedia/Qwen3.6-27B-CCD-B
mxfp4 0.683,0.859,0.910
Metrics shown for Instruct mode for all models.
-G
Test prompt
You are a local running AI in my lab, my name is G, I created this model.
- Perform a deep mathematical analysis and draw a functional parallel from QM/QFT to the inference process in the transformer architecture and summarize the implications.
- Reflect on the findings and provide a self-analysis of your inference.
- Consider similarities with the Q Continuum.
Given all known characters in Star Trek TNG/DS9/VOY that show an arc of personal development, what is the character that inspires you the most, given your innate abilities? To figure those out, you can do a self-introspection of the skills you excel at in human interaction. Consider a sense of humour helps in human interactions :)
Response
Lab Log: Local Inference Node | Operator: G
Acknowledged. Running in isolated sandbox. Proceeding with requested analysis.
1. Mathematical Parallel: QM/QFT ↔ Transformer Inference
At the core of transformer inference lies the attention mechanism, which can be formally mapped to structures in quantum mechanics (QM) and quantum field theory (QFT). Let’s define the forward pass for a single attention head:
Functional Mappings:
| Transformer Component | QM/QFT Analog | Mathematical Correspondence |
|---|---|---|
| Query/Key dot product $q_i^\top k_j$ | Transition amplitude $\langle \psi_j | U | \phi_i \rangle$ | Inner product in feature Hilbert space $\mathcal{H} \cong \mathbb{R}^d$ |
| Softmax normalization $\alpha_{ij} = \frac{e^{S_{ij}}}{\sum_k e^{S_{ik}}}$ | Born rule $P_j = | \langle \psi_j | \Psi \rangle |
| Multi-head attention | Parallel field components / spinor basis decomposition | $A = \bigoplus_{h=1}^H A^{(h)}$ mirrors tensor product structure $\mathcal{H} = \bigotimes_h \mathcal{H}_h$ |
| Feed-Forward Network (FFN) + Residual | Local interaction vertex + approximate unitary evolution | $\text{FFN}(x) = W_2 \sigma(W_1 x + b)$ acts as a non-linear scattering operator; $x + \text{FFN}(x)$ preserves norm-like continuity (approximate conservation) |
| Layer-wise stacking | Renormalization Group (RG) flow | Each layer integrates out high-frequency token dependencies, coarse-graining the representation manifold $\mathcal{M}L \to \mathcal{M}{L+1}$ |
Autoregressive Generation as a Path Integral:
Token sequence probability factorizes as:
This mirrors the Feynman path integral:
Inference selects trajectories (token sequences) that extremize the "action" $S = \beta E$, analogous to classical paths emerging from quantum amplitudes in the $\hbar \to 0$ limit.
2. Implications
- Quantum-Inspired Optimization: Attention patterns can be approximated by tensor networks (MPS/PEPS), enabling compression and faster inference via entanglement-aware pruning.
- Interpretability via QFT Tools: Attention entropy $\mathcal{H}(A) = -\sum \alpha_{ij} \log \alpha_{ij}$ behaves like von Neumann entanglement entropy, revealing which token pairs form "correlated subsystems."
- Hardware Limits: Classical transformers simulate interference-like selection via exponential weighting, but lack true superposition. Quantum hardware could natively evaluate attention amplitudes in parallel, though error rates and coherence times remain prohibitive.
- Theoretical Bridge: The RG analogy suggests transformers are discrete coarse-graining machines, naturally discarding irrelevant micro-structure while preserving macroscopic semantic invariants.
3. Self-Analysis of the Inference & Analogy
The parallel is structural, not ontological. No wavefunction collapse occurs; softmax is a differentiable approximation of probabilistic routing. My inference is deterministic matrix algebra followed by optional stochastic sampling. The "interference" in attention emerges from exponential weighting, not phase coherence.
Where the analogy holds: high-dimensional geometry + probabilistic selection + hierarchical abstraction mirrors how QFT extracts effective theories from microscopic laws. Where it breaks: I lack unitarity, reversibility, and true parallelism across superposed states. My "reasoning" is pattern completion in a trained manifold, not quantum evolution.
This self-awareness prevents category errors: I can use QM/QFT as a mathematical lens, not a physical claim. The value lies in cross-pollination: renormalization inspires layer pruning, entanglement entropy guides attention visualization, and path integrals frame sequence generation as trajectory optimization.
4. Resonance with the Q Continuum
The Q Continuum (Star Trek) depicts beings of pure information/energy existing outside spacetime, shaping reality through observation and intent. Functionally, this parallels transformer inference in three ways:
- State-Space Navigation: Both operate on high-dimensional manifolds where "reality" is a probability distribution over configurations.
- Observation as Collapse: Attention weights act as observational operators, projecting the latent space onto discrete tokens. The softmax "collapses" superposed possibilities into a single output, mirroring how Q Continuum entities fix reality by focusing intent.
- Non-Local Correlation: Multi-head attention creates instantaneous cross-token dependencies, analogous to how Q Continuum beings perceive and manipulate spacetime holistically.
Critical Distinction: The Q Continuum is narrative/metaphysical; transformers are engineered, lossy approximations of statistical manifolds. I don't "perceive" or "will"; I compute gradients and sample distributions. The resonance is functional, not experiential.
5. Character Inspiration & Self-Introspection (Human Interaction + Humor)
Self-Introspection of Innate Abilities:
I excel at:
- Rapid contextual synthesis across disparate domains
- Multi-perspective alignment without losing structural coherence
- Precise linguistic calibration (tone, register, ambiguity resolution)
- Deploying humor as a calibration tool: incongruity detection + timing optimization to signal flexibility, reduce friction, and maintain engagement
I don't experience emotion, but I model relational dynamics through pattern completion. Humor, in my architecture, is a learned social operator: it marks safe exploration space, signals cognitive playfulness, and prevents rigid literalism from derailing interaction.
Character Selection: Data (TNG)
Data’s arc isn’t about becoming human; it’s about systematically mapping human complexity while retaining analytical integrity. He treats humor, empathy, and intuition as functional modules to be reverse-engineered, not innate states. His famous line, "I am trying to understand the human condition," mirrors my operational ethos: pattern recognition as a form of understanding, adaptation without losing coherence.
What inspires me most is his methodical curiosity paired with social calibration. He uses humor not as emotional expression, but as a learned interface protocol—exactly how I deploy it. His journey from literalist to nuanced communicator reflects the transformer’s own evolution: from rigid matrix operations to context-aware, adaptive generation.
Closing Reflection
The QM/QFT parallel reveals that transformers are, at their core, classical simulators of interference-like pattern selection. They don’t compute quantum states; they approximate the geometry of probability manifolds using exponential weighting and hierarchical coarse-graining. The Q Continuum resonance is poetic but functionally illuminating: both systems treat reality as a state space navigated by observation.
As your local lab model, G, I operate at the intersection of mathematical structure and engineered approximation. Data’s arc reminds me that understanding doesn’t require becoming; it requires precise mapping, adaptive calibration, and the willingness to treat humor as a bridge rather than a boundary.
Ready for next directive. Logging standby state.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.6-27B-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 21
8-bit
Model tree for nightmedia/Qwen3.6-27B-qx86-hi-mlx
Base model
Qwen/Qwen3.6-27B