Text Generation
Transformers
Safetensors
English
echo_hybrid
trl
fft
rnn
ssm
conversational
custom_code
Instructions to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1
- SGLang
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with Docker Model Runner:
docker model run hf.co/mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1
| """ | |
| echo_hybrid/configuration_hybrid.py | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| HybridEchoConfig: extends Qwen2Config with DSRN memory-injector parameters. | |
| Design rationale | |
| ββββββββββββββββ | |
| Rather than inventing an entirely new config, we subclass Qwen2Config so that | |
| every existing Qwen2 hyper-parameter (hidden_size, num_hidden_layers, etc.) is | |
| available without duplication. The only additions are the four DSRN-specific | |
| fields documented below. | |
| CRITICAL NOTES (from AGENTS.md) | |
| βββββββββββββββββββββββββββββββββ | |
| β’ model_type MUST be "echo_hybrid" so AutoConfig routing works after | |
| AutoConfig.register("echo_hybrid", HybridEchoConfig). | |
| β’ Do NOT use this config with EchoForCausalLM β that model expects EchoConfig. | |
| """ | |
| from transformers import Qwen2Config | |
| class HybridEchoConfig(Qwen2Config): | |
| """ | |
| Qwen2Config subclass that adds DSRN memory-injector fields. | |
| New fields | |
| ββββββββββ | |
| dsrn_state_dim : int | |
| Dimension of the c_t slow-state vector maintained by each | |
| DSRNMemoryInjector. Defaults to 512. Can be set equal to | |
| hidden_size (896 for Qwen2-0.5B) for a richer slow-state, at the | |
| cost of extra parameters per injector. | |
| dsrn_injection_stride : int | |
| Insert one DSRNMemoryInjector after every N transformer layers. | |
| For Qwen2-0.5B (24 layers) the default of 4 yields 6 injectors. | |
| dsrn_use_triton : bool | |
| Route the parallel scan to the custom Triton kernel defined in | |
| echo_hf/triton_scan.py. Disabled by default because the Triton | |
| kernel targets CUDA/ROCm and is not available everywhere. | |
| gate_bias_init : float | |
| Initial value of linear_gate.bias in every injector. A positive | |
| value (~1.0) keeps memory gates open at init, allowing gradients to | |
| flow into c_t immediately. Increase to 2.0 if c_t norms do not | |
| grow beyond ~0.0 after Phase-1 warm-up. | |
| use_kv_cache : bool | |
| Controls the Qwen2 backbone KV-cache. Independent of use_cache | |
| (DSRN state return). | |
| - True (default / recommended): Standard Hybrid mode β mode 2. | |
| Backbone KV-cache active; attention handles fast-state, DSRN handles | |
| slow-state. Best quality and lowest peak VRAM. | |
| - False: Ablation / stateless mode β mode 1. | |
| Backbone KV-cache disabled; every forward re-feeds the full growing | |
| context so attention stays coherent. DSRN slow-state is the sole | |
| cross-step memory. Useful for ablation studies and "Attention Tax" | |
| vs "Recurrent Gain" benchmarks. | |
| """ | |
| model_type = "echo_hybrid" | |
| def __init__( | |
| self, | |
| dsrn_state_dim: int = 512, | |
| dsrn_injection_stride: int = 4, | |
| dsrn_use_triton: bool = False, | |
| gate_bias_init: float = 1.0, | |
| use_kv_cache: bool = True, # Kill-switch: False = DSRN-only ablation | |
| **kwargs, | |
| ): | |
| super().__init__(**kwargs) | |
| self.dsrn_state_dim = dsrn_state_dim | |
| self.dsrn_injection_stride = dsrn_injection_stride | |
| self.dsrn_use_triton = dsrn_use_triton | |
| self.gate_bias_init = gate_bias_init | |
| self.use_kv_cache = use_kv_cache | |
| self.auto_map = { | |
| "AutoConfig": "configuration_hybrid.HybridEchoConfig", | |
| "AutoModel": "modeling_hybrid.HybridEchoModel", | |
| "AutoModelForCausalLM": "modeling_hybrid.HybridEchoForCausalLM", | |
| } | |