Habitat 3.0 Social Rearrangement β SHARM (Shared Hierarchical Recurrent Memory)
Status: training in progress. This card describes the architecture and training plan. Trained weights will be uploaded once the Stage 3/4 runs complete and have been evaluated on the fixed validation split.
The Pereason + Go + Fabric + SHARM multi-agent coordination model. Two embodied agents (a Spot robot arm and a humanoid) cooperate to rearrange objects across HSSD home scenes.
SHARM extends the Fabric model with a learned, persistent, shared memory layer. Where Fabric exchanges per-step perceptual messages between agents, SHARM gives each agent a typed slot bank that accumulates state across an episode and is gossiped to the partner β directly inspired by stigmergic coordination (ant pheromone trails) re-cast as a learned policy module.
This work is part of the thesis "Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics" by Benjamin Kubwimana. SHARM is the learned counterpart to the hand-coded CollabTime DSM evaluated earlier in the thesis.
What's in this repo
| File | Description |
|---|---|
| (weights pending) | Trained Stage 3 / Stage 4 SHARM checkpoints will land here once eval is complete |
Architecture
RGB+lang ββ SmolVLM2 (frozen, 350M) β
depth βββββ DepthAnythingV2 (trained, 25M) ββ΄β fused tokens (B, S, 960)
β
Fabric: encode 128-d msg, broadcast,
cross-attend partner msg, gated residual
β
SHARM: write to typed slot bank,
self-attend over slots, encode gossip,
cross-attend partner gossip
β
Go transformer (PPO) ββ skill choice
SHARM specifics
| Component | Detail |
|---|---|
| Slot bank | 8 typed slots: 4 perception + 2 task + 2 intent |
| Slot dim | 64 |
| Memory state | 520 floats per env, packed into Habitat's recurrent_hidden_states buffer |
| Write head | Sparse content-addressed: produces (key, content, gate) per slot, soft-routed by similarity to learned slot key embeddings |
| Decay | Stigmergic β multiplicative attenuation by slot age, half-life ~100 steps |
| Read | Self-attention over own slots + cross-attention over partner gossip |
| Gossip | (K, slot_dim) β (K, msg_dim=128) encoder, decoded by partner |
| Training | Truncated BPTT through Habitat's rnn_build_seq_info |
Auxiliary losses (training-time only)
SHARM is bootstrapped with two non-PPO gradient signals; both annealed:
| Loss | What it pushes | Schedule |
|---|---|---|
| Reconstruction | Slots must encode partner's hand-coded task state (skill, holding, target) | weight 1.0 β 0.0 over first 50% of training |
| Future-latent | Slots must enable predicting own pooled fused features at horizon h=4 | weight 0.5 β 0.1, sustained |
The reconstruction loss serves as a teacher (so the write head learns what to encode) and is released so the model can discover signals beyond the human-designed schema.
Training plan
| Stage | Frames | Purpose |
|---|---|---|
| 3a (sanity) | ~5M | Verify wiring, no crashes, recon loss converges |
| 3b (full) | ~55M | Headline Stage 3 result |
| 4 (ablation) | ~30M | Drop reconstruction aux, warm-start from Stage 3 best |
Frame budget is capped at 60M per run. Total compute budget ~90M frames across all stages.
Evaluation (planned)
All checkpoints will be evaluated on the same fixed 100-episode validation split used for the thesis Table 4.1, with metrics:
pddl_success(primary)num_agents_collideepisode_steps
For comparison, prior work on this benchmark:
| Condition | Success | Collide | Source |
|---|---|---|---|
| Oracle baseline | 0.28 | 0.71 | Thesis Table 4.1 |
| Oracle + CollabTime (hand-coded DSM) | 0.51 | 0.48 | Thesis Table 4.1 |
| Trained RL baseline (ResNet-LSTM) | 0.15 | 0.47 | Thesis Table 4.1 |
| Trained RL + CollabTime | 0.37 | 0.32 | Thesis Table 4.1 |
| Fabric (model) | 0.43 | 0.16 | Thesis Table 4.1 |
| SHARM Stage 3 | (pending) | (pending) | β |
| SHARM Stage 4 | (pending) | (pending) | β |
How to use (once weights land)
Weights will load via the same Habitat-baselines harness as the Fabric release. See the GitHub repository for the full training and evaluation pipeline.
git clone https://github.com/bkubwimana/ivalab.git
cd ivalab && git submodule update --init --recursive
git checkout feature/fabric-dsm # SHARM lives on this branch until merged
bash scripts/eval_trained.sh pereason_go_fabric_dsm
Citation
@thesis{kubwimana_thesis_2026,
title = {Scalable Multi-Agent Coordination Using a Shared-Context
Architecture for Embodied Robotics},
author = {Kubwimana, Benjamin},
year = {2026}
}