Instructions to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="john-broadway/SmolLM2-135M-RYS-18-22-GGUF", filename="SmolLM2-135M-RYS-18-22-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
Use Docker
docker model run hf.co/john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with Ollama:
ollama run hf.co/john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
- Unsloth Studio
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for john-broadway/SmolLM2-135M-RYS-18-22-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for john-broadway/SmolLM2-135M-RYS-18-22-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for john-broadway/SmolLM2-135M-RYS-18-22-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with Docker Model Runner:
docker model run hf.co/john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
- Lemonade
How to use john-broadway/SmolLM2-135M-RYS-18-22-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull john-broadway/SmolLM2-135M-RYS-18-22-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.SmolLM2-135M-RYS-18-22-GGUF-Q4_K_M
List all available models
lemonade list
Remove broken reference to deleted -RYS-eval companion repo (consolidated into this weights repo)
5c9e14a verified | license: apache-2.0 | |
| base_model: HuggingFaceTB/SmolLM2-135M-Instruct | |
| tags: | |
| - rys | |
| - layer-duplication | |
| - reasoning-circuits | |
| - gguf | |
| - sovereign-collection-v2 | |
| # SmolLM2-135M-RYS-18-22 | |
| SmolLM2-135M-Instruct with layers 18-21 duplicated. The late-stack reasoning + EQ circuit runs twice on every forward pass. | |
| 30 base layers β 34 after duplication. No training, no merging, no weight changes. | |
| **Reasoning 17.65% β 35.30% (+17.65). EQ 44.53 β 57.58 (+13.05). Math 0.315 β 0.303 (β1.20).** | |
| ## Results | |
| | Metric | Baseline | RYS (18,22) | Delta | | |
| |--------|----------|-------------|-------| | |
| | Math | 0.315 | 0.303 | β1.20 | | |
| | EQ | 44.53 | 57.58 | +13.05 | | |
| | Reasoning | 17.65% | 35.30% | +17.65 | | |
| **The tiniest responder.** SmolLM2-135M is the smallest model in the v2 corpus by an order of magnitude. RYS lifts both reasoning (+17.65 absolute) AND EQ (+13.05) simultaneously β the response is unremarkable on its own, but the comparison to sibling SmolLM2-1.7B (which lifts **zero percent on reasoning**) makes this card load-bearing: it falsifies the "SmolLM2 training recipe doesn't work with RYS" hypothesis. **The 1.7B negative result is uniquely anomalous within the family, not architectural.** | |
| Pick this when you want the smallest possible model with reasoning + EQ lift. At 110MB Q4_K_M, this is the lightest RYS-applied checkpoint in the collection. | |
| ## Usage | |
| ``` | |
| llama-server -m SmolLM2-135M-RYS-18-22-Q4_K_M.gguf -ngl 99 | |
| ``` | |
| ## Full sweep data | |
| 40 configurations tested. (18,22) block-4 is the best-combined pick. Full per-config sweep + cross-architecture analysis: [v2 dataset](https://huggingface.co/datasets/john-broadway/rys-sovereign-collection-v2). | |
| Part of the RYS Sovereign Collection v2. | |
| --- | |
| ## Where this sits in the Sovereign Collection | |
| **v1 β Qwen2.5 cross-scale + Qwen3-32B headline crossover.** 5 model repos. | |
| **v2 β cross-architecture corpus.** 21 model variants across 10 architecture families. Inverse correlation (r = β0.726): weak baselines lift more, in their weakest dimension. 13 deployable RYS-applied weight repos covering every non-zero-lift variant. | |
| **SmolLM2 family picture** (all Q4_K_M): | |
| - 135M (this card) β baseline reasoning 17.65%, peak Ξ **+17.65%** (responds) | |
| - [`SmolLM2-360M-RYS-12-15-GGUF`](https://huggingface.co/john-broadway/SmolLM2-360M-RYS-12-15-GGUF) β baseline reasoning 29.41%, peak Ξ **+23.53%** (responds) | |
| - [`SmolLM2-1.7B-RYS-eval`](https://huggingface.co/john-broadway/SmolLM2-1.7B-RYS-eval) β baseline reasoning 58.82%, peak Ξ **+0.00%** (does NOT respond; eval-only β no RYS-applied weights since no lift to deliver) | |
| **Credit** | |
| John Broadway, with collaboration from Claude (Opus 4.6 in April 2026 sweep generation and build pipeline; Opus 4.7 in May 2026 cross-architecture analysis and publication). Original RYS method by [David Ng](https://dnhkng.github.io/posts/rys/) on Qwen2-72B; sweep + probe toolkit by [alainnothere](https://github.com/alainnothere/llm-circuit-finder). | |