How to use from
Hermes Agent
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lovedheart/Qwen3-Next-REAP-40B-A3B-Instruct-GGUF:
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default lovedheart/Qwen3-Next-REAP-40B-A3B-Instruct-GGUF:
Run Hermes
hermes
Quick Links

qwen3-next-instruction

Qwen3-Next-REAP-40B-A3B-Instruct has the following specifications:

  • Type: Causal Language Models
  • Number of Parameters: 40B in total and 3B activated
  • Hidden Dimension: 2048
  • Number of Layers: 48
  • Hybrid Layout: 12 * (3 * (Gated DeltaNet -> MoE) -> 1 * (Gated Attention -> MoE))
  • Gated Attention:
  • Number of Attention Heads: 16 for Q and 2 for KV
  • Head Dimension: 256
  • Rotary Position Embedding Dimension: 64
  • Gated DeltaNet:
    **Number of Linear Attention Heads: 32 for V and 16 for QK
    **Head Dimension: 128
  • Mixture of Experts:
  • **Number of Experts: 256 (uniformly pruned from 512)
  • **Number of Activated Experts: 10
  • **Number of Shared Experts: 1
  • Context Length: 262,144 natively and extensible up to 1,010,000 tokens
  • Compression Method: REAP (Router-weighted Expert Activation Pruning)
  • Compression Ratio: 50% expert pruning
Downloads last month
39
GGUF
Model size
41B params
Architecture
qwen3next
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lovedheart/Qwen3-Next-REAP-40B-A3B-Instruct-GGUF

Quantized
(68)
this model