Model Card for ethicalabs/Echo-DSRN-114M-v0.1.2

GitHub License Python Model Collection Hybrid Collection Working Paper

The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) 🌱

⚠️ Important Notice

This is a research prototype and demo model.

  • Not production-ready
  • Will hallucinate and give incorrect answers
  • Do not use for any real-world decisions
  • Intended for architecture experimentation only

What Works

  • Text generation is fluent
  • Memory usage is constant O(1)
  • Runs on CPUs, NPUs, GPUs (Tested on AMD's ROCm and Apple's MPS)

What Doesn't Work

  • Factual accuracy
  • Instruction following
  • Common sense reasoning

Intended Operations: Edge-Native "Smol" Tasks

Echo-DSRN is optimized for high-frequency, low-latency edge deployment.

  • Intent Dispatch: routing of user prompts to APIs, scripts, or heavier cloud models. Gradio App
  • Semantic Compression: long-context document digestion with flat O(1) memory.
  • Schema Translation: Deterministic conversion of unstructured text into rigid JSON or function calls.
  • NER & Classification: extraction of target variables from noisy text.
  • PII Sanitization: On-device redaction of sensitive data before external network
  • Log Parsing: log stream monitoring and anomaly detection without cache overflow.
  • Local Autocomplete: next-word prediction for local scripting and queries.

πŸ—οΈ Architecture Details

Property Value
Model Type echo_dsrn
Layers 8
Hidden Dim 512
Attention Heads 4
MLP Ratio 8.0
Vocab Size 32011
Hybrid Attention True
RMSNorm True

πŸ“Š Parameter Breakdown

Component Parameters % of Total
Total 114.69M (114,687,488) 100%
Embeddings 16.39M 14.29%
DSRN Blocks (Aggregate) 81.91M 71.42%
LM Head 16.39M 14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component Parameters Description
MLP (Feed-Forward) 4.20M Upscaled hidden layers
DSRN Slow State 3.15M Constant-time memory gates
GRU Fast State 1.58M Recurrent fast path
Surprise Gating 264,192 Dynamic focus mechanism
Normalization 1,024 LayerNorm / RMSNorm

Supervised Fine-Tuning (SFTTrainer)

2 epochs on a single AMD Instinct MI300X (192 GB RAM)

image

Evaluation

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc ↑ 0.4289 Β± 0.0102
none 0 acc_norm ↑ 0.4078 Β± 0.0101
boolq 2 none 0 acc ↑ 0.4064 Β± 0.0086
hellaswag 1 none 0 acc ↑ 0.2692 Β± 0.0044
none 0 acc_norm ↑ 0.2757 Β± 0.0045
piqa 1 none 0 acc ↑ 0.5789 Β± 0.0115
none 0 acc_norm ↑ 0.5637 Β± 0.0116
sciq 1 none 0 acc ↑ 0.5980 Β± 0.0155
none 0 acc_norm ↑ 0.5610 Β± 0.0157
winogrande 1 none 0 acc ↑ 0.4957 Β± 0.0141
Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 5 acc ↑ 0.3910 Β± 0.0100
none 5 acc_norm ↑ 0.3645 Β± 0.0099
boolq 2 none 5 acc ↑ 0.5098 Β± 0.0087
hellaswag 1 none 5 acc ↑ 0.2717 Β± 0.0044
none 5 acc_norm ↑ 0.2717 Β± 0.0044
piqa 1 none 5 acc ↑ 0.5686 Β± 0.0116
none 5 acc_norm ↑ 0.5642 Β± 0.0116
sciq 1 none 5 acc ↑ 0.5570 Β± 0.0157
none 5 acc_norm ↑ 0.4970 Β± 0.0158
winogrande 1 none 5 acc ↑ 0.4933 Β± 0.0141

Citation

If you use this model in your research, please cite it as follows:

@misc{Massimo Roberto Scamarcia, title={Echo-DSRN-114M: Surprise-Gated Dual-State Recurrent Architecture for Efficient Language Modeling and Classification}, DOI={10.5281/zenodo.19848279}, publisher={Zenodo}, author={Massimo Roberto Scamarcia} }
Downloads last month
5,674
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ethicalabs/Echo-DSRN-114M-v0.1.2

Finetuned
(1)
this model
Adapters
4 models
Finetunes
2 models
Merges
1 model

Dataset used to train ethicalabs/Echo-DSRN-114M-v0.1.2

Spaces using ethicalabs/Echo-DSRN-114M-v0.1.2 3

Collection including ethicalabs/Echo-DSRN-114M-v0.1.2