Habitat 3.0 Social Rearrangement β€” SHARM (Shared Hierarchical Recurrent Memory)

Status: training in progress. This card describes the architecture and training plan. Trained weights will be uploaded once the Stage 3/4 runs complete and have been evaluated on the fixed validation split.

The Pereason + Go + Fabric + SHARM multi-agent coordination model. Two embodied agents (a Spot robot arm and a humanoid) cooperate to rearrange objects across HSSD home scenes.

SHARM extends the Fabric model with a learned, persistent, shared memory layer. Where Fabric exchanges per-step perceptual messages between agents, SHARM gives each agent a typed slot bank that accumulates state across an episode and is gossiped to the partner β€” directly inspired by stigmergic coordination (ant pheromone trails) re-cast as a learned policy module.

This work is part of the thesis "Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics" by Benjamin Kubwimana. SHARM is the learned counterpart to the hand-coded CollabTime DSM evaluated earlier in the thesis.

What's in this repo

File Description
(weights pending) Trained Stage 3 / Stage 4 SHARM checkpoints will land here once eval is complete

Architecture

RGB+lang ─→ SmolVLM2 (frozen, 350M)         ┐
depth ────→ DepthAnythingV2 (trained, 25M) ─┴→ fused tokens (B, S, 960)
                                                       ↓
                          Fabric: encode 128-d msg, broadcast,
                          cross-attend partner msg, gated residual
                                                       ↓
                          SHARM: write to typed slot bank,
                          self-attend over slots, encode gossip,
                          cross-attend partner gossip
                                                       ↓
                          Go transformer (PPO) ─→ skill choice

SHARM specifics

Component Detail
Slot bank 8 typed slots: 4 perception + 2 task + 2 intent
Slot dim 64
Memory state 520 floats per env, packed into Habitat's recurrent_hidden_states buffer
Write head Sparse content-addressed: produces (key, content, gate) per slot, soft-routed by similarity to learned slot key embeddings
Decay Stigmergic β€” multiplicative attenuation by slot age, half-life ~100 steps
Read Self-attention over own slots + cross-attention over partner gossip
Gossip (K, slot_dim) β†’ (K, msg_dim=128) encoder, decoded by partner
Training Truncated BPTT through Habitat's rnn_build_seq_info

Auxiliary losses (training-time only)

SHARM is bootstrapped with two non-PPO gradient signals; both annealed:

Loss What it pushes Schedule
Reconstruction Slots must encode partner's hand-coded task state (skill, holding, target) weight 1.0 β†’ 0.0 over first 50% of training
Future-latent Slots must enable predicting own pooled fused features at horizon h=4 weight 0.5 β†’ 0.1, sustained

The reconstruction loss serves as a teacher (so the write head learns what to encode) and is released so the model can discover signals beyond the human-designed schema.

Training plan

Stage Frames Purpose
3a (sanity) ~5M Verify wiring, no crashes, recon loss converges
3b (full) ~55M Headline Stage 3 result
4 (ablation) ~30M Drop reconstruction aux, warm-start from Stage 3 best

Frame budget is capped at 60M per run. Total compute budget ~90M frames across all stages.

Evaluation (planned)

All checkpoints will be evaluated on the same fixed 100-episode validation split used for the thesis Table 4.1, with metrics:

  • pddl_success (primary)
  • num_agents_collide
  • episode_steps

For comparison, prior work on this benchmark:

Condition Success Collide Source
Oracle baseline 0.28 0.71 Thesis Table 4.1
Oracle + CollabTime (hand-coded DSM) 0.51 0.48 Thesis Table 4.1
Trained RL baseline (ResNet-LSTM) 0.15 0.47 Thesis Table 4.1
Trained RL + CollabTime 0.37 0.32 Thesis Table 4.1
Fabric (model) 0.43 0.16 Thesis Table 4.1
SHARM Stage 3 (pending) (pending) β€”
SHARM Stage 4 (pending) (pending) β€”

How to use (once weights land)

Weights will load via the same Habitat-baselines harness as the Fabric release. See the GitHub repository for the full training and evaluation pipeline.

git clone https://github.com/bkubwimana/ivalab.git
cd ivalab && git submodule update --init --recursive
git checkout feature/fabric-dsm  # SHARM lives on this branch until merged
bash scripts/eval_trained.sh pereason_go_fabric_dsm

Citation

@thesis{kubwimana_thesis_2026,
  title  = {Scalable Multi-Agent Coordination Using a Shared-Context
            Architecture for Embodied Robotics},
  author = {Kubwimana, Benjamin},
  year   = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train edge-inference/hab3-social-rearrange-sharm