Text Generation
Transformers
Safetensors
English
echo_hybrid
trl
fft
rnn
ssm
conversational
custom_code
Instructions to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1
- SGLang
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1 with Docker Model Runner:
docker model run hf.co/mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1
vLLM support is...
#4
by mrs83 - opened
Inference Benchmark — echo-hybrid
Inference benchmark on single AMD Radeon AI Pro R9700
Endpoint: http://localhost:8001/v1
Prompt: Explain the concept of recursion briefly.
Max tokens: 64 · Warmup: 2 req · Date: 2026-05-16 13:15 UTC
| Concurrency | Throughput (req/s) | TTFT p50 | TTFT p95 | Latency p50 | Latency p95 | Errors |
|---|---|---|---|---|---|---|
| 1 | 0.88 | 48 ms | 57 ms | 1137 ms | 1152 ms | 0 |
| 2 | 1.69 | 58 ms | 83 ms | 1187 ms | 1199 ms | 0 |
| 4 | 3.35 | 74 ms | 86 ms | 1193 ms | 1203 ms | 0 |
| 8 | 5.19 | 82 ms | 91 ms | 1540 ms | 1548 ms | 0 |
| 16 | 10.21 | 74 ms | 97 ms | 1565 ms | 1584 ms | 0 |
| 32 | 20.34 | 94 ms | 109 ms | 1570 ms | 1584 ms | 0 |
Inference benchmark on single AMD Instinct MI300X VF - vLLM on ROCm 7.2.0
Inference Benchmark — echo-hybrid
Endpoint: http://127.0.0.1:8001/v1
Prompt: Explain the concept of recursion briefly.
Max tokens: 256 · Warmup: 2 req · Date: 2026-05-16 15:58 UTC
| Concurrency | Throughput (req/s) | TTFT p50 | TTFT p95 | Latency p50 | Latency p95 | Errors |
|---|---|---|---|---|---|---|
| 1 | 0.24 | 73 ms | 83 ms | 3826 ms | 7277 ms | 0 |
| 2 | 0.45 | 98 ms | 115 ms | 3832 ms | 7711 ms | 0 |
| 4 | 0.76 | 101 ms | 114 ms | 4439 ms | 7897 ms | 0 |
| 8 | 1.15 | 108 ms | 124 ms | 5527 ms | 9148 ms | 0 |
| 16 | 2.10 | 113 ms | 133 ms | 6180 ms | 9307 ms | 0 |
| 32 | 4.20 | 113 ms | 131 ms | 5889 ms | 9339 ms | 0 |
| 64 | 7.87 | 120 ms | 140 ms | 6598 ms | 10047 ms | 0 |
