Instructions to use ethicalabs/Echo-DSRN-114M-v0.1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ethicalabs/Echo-DSRN-114M-v0.1.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ethicalabs/Echo-DSRN-114M-v0.1.2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ethicalabs/Echo-DSRN-114M-v0.1.2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ethicalabs/Echo-DSRN-114M-v0.1.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ethicalabs/Echo-DSRN-114M-v0.1.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ethicalabs/Echo-DSRN-114M-v0.1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ethicalabs/Echo-DSRN-114M-v0.1.2

SGLang

How to use ethicalabs/Echo-DSRN-114M-v0.1.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ethicalabs/Echo-DSRN-114M-v0.1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ethicalabs/Echo-DSRN-114M-v0.1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ethicalabs/Echo-DSRN-114M-v0.1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ethicalabs/Echo-DSRN-114M-v0.1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ethicalabs/Echo-DSRN-114M-v0.1.2 with Docker Model Runner:
```
docker model run hf.co/ethicalabs/Echo-DSRN-114M-v0.1.2
```

Model Card for ethicalabs/Echo-DSRN-114M-v0.1.2

The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) 🌱

⚠️ Important Notice

This is a research prototype and demo model.

Not production-ready
Will hallucinate and give incorrect answers
Do not use for any real-world decisions
Intended for architecture experimentation only

What Works

Text generation is fluent
Memory usage is constant O(1)
Runs on CPUs, NPUs, GPUs (Tested on AMD's ROCm and Apple's MPS)

What Doesn't Work

Factual accuracy
Instruction following
Common sense reasoning

Intended Operations: Edge-Native "Smol" Tasks

Echo-DSRN is optimized for high-frequency, low-latency edge deployment.

Intent Dispatch: routing of user prompts to APIs, scripts, or heavier cloud models. Gradio App
Semantic Compression: long-context document digestion with flat O(1) memory.
Schema Translation: Deterministic conversion of unstructured text into rigid JSON or function calls.
NER & Classification: extraction of target variables from noisy text.
PII Sanitization: On-device redaction of sensitive data before external network
Log Parsing: log stream monitoring and anomaly detection without cache overflow.
Local Autocomplete: next-word prediction for local scripting and queries.

🏗️ Architecture Details

Property	Value
Model Type	echo_dsrn
Layers	8
Hidden Dim	512
Attention Heads	4
MLP Ratio	8.0
Vocab Size	32011
Hybrid Attention	True
RMSNorm	True

📊 Parameter Breakdown

Component	Parameters	% of Total
Total	114.69M (114,687,488)	100%
Embeddings	16.39M	14.29%
DSRN Blocks (Aggregate)	81.91M	71.42%
LM Head	16.39M	14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component	Parameters	Description
MLP (Feed-Forward)	4.20M	Upscaled hidden layers
DSRN Slow State	3.15M	Constant-time memory gates
GRU Fast State	1.58M	Recurrent fast path
Surprise Gating	264,192	Dynamic focus mechanism
Normalization	1,024	LayerNorm / RMSNorm

Supervised Fine-Tuning (SFTTrainer)

2 epochs on a single AMD Instinct MI300X (192 GB RAM)

Evaluation

Tasks	Version	Filter	Metric		Value		Stderr
arc_easy	1	none	acc	↑	0.4289	±	0.0102
		none	acc_norm	↑	0.4078	±	0.0101
boolq	2	none	acc	↑	0.4064	±	0.0086
hellaswag	1	none	acc	↑	0.2692	±	0.0044
		none	acc_norm	↑	0.2757	±	0.0045
piqa	1	none	acc	↑	0.5789	±	0.0115
		none	acc_norm	↑	0.5637	±	0.0116
sciq	1	none	acc	↑	0.5980	±	0.0155
		none	acc_norm	↑	0.5610	±	0.0157
winogrande	1	none	acc	↑	0.4957	±	0.0141

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_easy	1	none	5	acc	↑	0.3910	±	0.0100
		none	5	acc_norm	↑	0.3645	±	0.0099
boolq	2	none	5	acc	↑	0.5098	±	0.0087
hellaswag	1	none	5	acc	↑	0.2717	±	0.0044
		none	5	acc_norm	↑	0.2717	±	0.0044
piqa	1	none	5	acc	↑	0.5686	±	0.0116
		none	5	acc_norm	↑	0.5642	±	0.0116
sciq	1	none	5	acc	↑	0.5570	±	0.0157
		none	5	acc_norm	↑	0.4970	±	0.0158
winogrande	1	none	5	acc	↑	0.4933	±	0.0141

Citation

If you use this model in your research, please cite it as follows:

@misc{Massimo Roberto Scamarcia, title={Echo-DSRN-114M: Surprise-Gated Dual-State Recurrent Architecture for Efficient Language Modeling and Classification}, DOI={10.5281/zenodo.19848279}, publisher={Zenodo}, author={Massimo Roberto Scamarcia} }