Habitat 3.0 Social Rearrangement — SHARM (Shared Hierarchical Recurrent Memory)

Status: training in progress. This card describes the architecture and training plan. Trained weights will be uploaded once the Stage 3/4 runs complete and have been evaluated on the fixed validation split.

The Pereason + Go + Fabric + SHARM multi-agent coordination model. Two embodied agents (a Spot robot arm and a humanoid) cooperate to rearrange objects across HSSD home scenes.

SHARM extends the Fabric model with a learned, persistent, shared memory layer. Where Fabric exchanges per-step perceptual messages between agents, SHARM gives each agent a typed slot bank that accumulates state across an episode and is gossiped to the partner — directly inspired by stigmergic coordination (ant pheromone trails) re-cast as a learned policy module.

This work is part of the thesis "Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics" by Benjamin Kubwimana. SHARM is the learned counterpart to the hand-coded CollabTime DSM evaluated earlier in the thesis.

What's in this repo

File	Description
(weights pending)	Trained Stage 3 / Stage 4 SHARM checkpoints will land here once eval is complete

Architecture

RGB+lang ─→ SmolVLM2 (frozen, 350M)         ┐
depth ────→ DepthAnythingV2 (trained, 25M) ─┴→ fused tokens (B, S, 960)
                                                       ↓
                          Fabric: encode 128-d msg, broadcast,
                          cross-attend partner msg, gated residual
                                                       ↓
                          SHARM: write to typed slot bank,
                          self-attend over slots, encode gossip,
                          cross-attend partner gossip
                                                       ↓
                          Go transformer (PPO) ─→ skill choice

SHARM specifics

Component	Detail
Slot bank	8 typed slots: 4 perception + 2 task + 2 intent
Slot dim	64
Memory state	520 floats per env, packed into Habitat's `recurrent_hidden_states` buffer
Write head	Sparse content-addressed: produces (key, content, gate) per slot, soft-routed by similarity to learned slot key embeddings
Decay	Stigmergic — multiplicative attenuation by slot age, half-life ~100 steps
Read	Self-attention over own slots + cross-attention over partner gossip
Gossip	`(K, slot_dim) → (K, msg_dim=128)` encoder, decoded by partner
Training	Truncated BPTT through Habitat's `rnn_build_seq_info`

Auxiliary losses (training-time only)

SHARM is bootstrapped with two non-PPO gradient signals; both annealed:

Loss	What it pushes	Schedule
Reconstruction	Slots must encode partner's hand-coded task state (skill, holding, target)	weight 1.0 → 0.0 over first 50% of training
Future-latent	Slots must enable predicting own pooled fused features at horizon h=4	weight 0.5 → 0.1, sustained

The reconstruction loss serves as a teacher (so the write head learns what to encode) and is released so the model can discover signals beyond the human-designed schema.

Training plan

Stage	Frames	Purpose
3a (sanity)	~5M	Verify wiring, no crashes, recon loss converges
3b (full)	~55M	Headline Stage 3 result
4 (ablation)	~30M	Drop reconstruction aux, warm-start from Stage 3 best

Frame budget is capped at 60M per run. Total compute budget ~90M frames across all stages.

Evaluation (planned)

All checkpoints will be evaluated on the same fixed 100-episode validation split used for the thesis Table 4.1, with metrics:

pddl_success (primary)
num_agents_collide
episode_steps

For comparison, prior work on this benchmark:

Condition	Success	Collide	Source
Oracle baseline	0.28	0.71	Thesis Table 4.1
Oracle + CollabTime (hand-coded DSM)	0.51	0.48	Thesis Table 4.1
Trained RL baseline (ResNet-LSTM)	0.15	0.47	Thesis Table 4.1
Trained RL + CollabTime	0.37	0.32	Thesis Table 4.1
Fabric (model)	0.43	0.16	Thesis Table 4.1
SHARM Stage 3	(pending)	(pending)	—
SHARM Stage 4	(pending)	(pending)	—

How to use (once weights land)

Weights will load via the same Habitat-baselines harness as the Fabric release. See the GitHub repository for the full training and evaluation pipeline.

git clone https://github.com/bkubwimana/ivalab.git
cd ivalab && git submodule update --init --recursive
git checkout feature/fabric-dsm  # SHARM lives on this branch until merged
bash scripts/eval_trained.sh pereason_go_fabric_dsm

Citation

@thesis{kubwimana_thesis_2026,
  title  = {Scalable Multi-Agent Coordination Using a Shared-Context
            Architecture for Embodied Robotics},
  author = {Kubwimana, Benjamin},
  year   = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

edge-inference
/

hab3-social-rearrange-sharm