NoesisLab

community

AI & ML interests

NoesisLab advances machine learning research in deep contemplation and reflective reasoning to enable more profound and self-aware artificial intelligence.

Recent Activity

OzTianlu published an article about 2 hours ago

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

OzTianlu authored a paper 5 days ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

OzTianlu submitted a paper 5 days ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

View all activity

Papers

Reasoning: From Reflection to Solution

View all Papers

Articles

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

Mar 1

• 2

View all articles

OzTianlu

posted an update about 2 hours ago

Post

ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?

Read online: https://datawhalechina.github.io/learning-terrain/

I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).

The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:

ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.

GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.

DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.

KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.

Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.

Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.

The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.

GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2

Convergence is not hope. Convergence is geometry. You see.

OzTianlu

published an article about 2 hours ago

Article

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

NoesisLab

•

about 2 hours ago

• 1

OzTianlu

authored a paper 5 days ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Paper • 2606.07207 • Published 9 days ago • 4

OzTianlu

submitted a paper to Daily Papers 5 days ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Paper • 2606.07207 • Published 9 days ago • 4

OzTianlu

authored a paper about 1 month ago

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Paper • 2605.06741 • Published May 7 • 1

OzTianlu

submitted a paper to Daily Papers about 1 month ago

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Paper • 2605.06741 • Published May 7 • 1

OzTianlu

posted an update 2 months ago

Post

1429

https://github.com/lizixi-0x2F/March
I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication.
When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste.
March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it.
- 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations)
- Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path
- Predictable memory usage — fixed-size page pool with O(L) complexity
- Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production

1 reply

OzTianlu

updated 2 models 3 months ago

NoesisLab/Kai-30B-Instruct

Text Generation • 33B • Updated Mar 26 • 9 • 21

NoesisLab/Arcade-3B

Text Generation • 3B • Updated Mar 16 • 11 • 8

OzTianlu

published an article 3 months ago

Article

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

NoesisLab

•

Mar 15

• 1

OzTianlu

published an article 3 months ago

Article

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

NoesisLab

•

Mar 15

• 1

OzTianlu

posted an update 3 months ago

Post

5414

Arcade-3B — SmolReasoner
NoesisLab/Arcade-3B
Arcade-3B is a 3B instruction-following and reasoning model built on SmolLM3-3B. It is the public release from the ARCADE project at NoesisLab, which investigates the State–Constraint Orthogonality Hypothesis: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.

5 replies

OzTianlu

published a model 3 months ago

NoesisLab/Arcade-3B

Text Generation • 3B • Updated Mar 16 • 11 • 8

OzTianlu

updated a model 3 months ago

NoesisLab/Collins-Embedding-3M

OzTianlu

posted an update 3 months ago

Post

1975

We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M
NoesisLab/Collins-Embedding-3M
Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, we’ve collapsed the embedding space into a fixed O(1) hash-map.
* STSB: 0.7114 (Beating many 100M+ models)
* Size: 3M (Edge-ready, IoT-ready)
* Tech: Randomized Sign-Hashing + RoPE positional injection.
Built by NoesisLab

OzTianlu

published a model 3 months ago

NoesisLab/Collins-Embedding-3M

OzTianlu

updated 2 Spaces 3 months ago

Kai 30B Instruct

🌖

Chat with Kai-30B-Instruct

README

🌖

OzTianlu

updated a model 3 months ago

NoesisLab/Kai-3B-Instruct

Text Generation • 3B • Updated Mar 4 • 9 • 5

OzTianlu

posted an update 3 months ago

Post

4793

🔥 UPGRADE in Kai: 30B Scaling! 🔥
NoesisLab/Kai-30B-Instruct
NoesisLab/Kai-30B-Instruct
We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! 🚀
If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we.
Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training.
The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward pass—no external scaffolding required.
At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks.
🧪 Test Kai yourself in our new Space:
NoesisLab/Kai-30B-Instruct
📦 Model Weights:
NoesisLab/Kai-30B-Instruct
Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! 🧱💥

1 reply

AI & ML interests

Recent Activity

Papers

Articles

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

Team members 2

NoesisLab's activity

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

Kai 30B Instruct

README