mmm-diffusion / README.md
sujimenon's picture
Add README with architecture docs and Kimodo mapping
dc0cbff verified
|
raw
history blame
8.98 kB

MMM-Diffusion: Marketing Mix Modeling via Dual-Denoiser Diffusion

A generative diffusion model for Marketing Mix Modeling (MMM) that predicts time-varying coefficients for sales decomposition. Adapted from NVIDIA's Kimodo/GMD dual-denoiser architecture.

Architecture

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚         MMM-Diffusion Architecture          β”‚
                    β”‚  (Adapted from Kimodo/GMD Dual-Denoiser)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  CONDITIONING     β”‚   β”‚  STAGE 1: Campaign/Geo Denoiser          β”‚
β”‚                   β”‚   β”‚  (β‰ˆ Kimodo Root Denoiser)                β”‚
β”‚  β€’ Media Spend    │──▢│                                          β”‚
β”‚    (5 channels)   β”‚   β”‚  Denoises aggregate-level patterns       β”‚
β”‚  β€’ Controls       β”‚   β”‚  from non-marketing vars + total sales   β”‚
β”‚    (3 variables)  β”‚   β”‚                                          β”‚
β”‚  β€’ Total Sales    β”‚   β”‚  Transformer Encoder (4 layers, d=128)   β”‚
β”‚                   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚ Campaign Context
                                       β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  STAGE 2: Channel Denoiser                   β”‚
                    β”‚  (β‰ˆ Kimodo Body Denoiser)                    β”‚
                    β”‚                                              β”‚
                    β”‚  Denoises per-channel time-varying Ξ²_t       β”‚
                    β”‚  conditioned on Stage 1 output + media spend β”‚
                    β”‚                                              β”‚
                    β”‚  Cross-Attention + Transformer (6 layers)    β”‚
                    β”‚                                              β”‚
                    β”‚  CONSTRAINT ENFORCEMENT:                     β”‚
                    β”‚  β€’ Log-space for media (exp β†’ always β‰₯ 0)    β”‚
                    β”‚  β€’ PhysDiff-style projection every K steps   β”‚
                    β”‚  β€’ Soft sign penalty loss                    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                                   β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  OUTPUT: Time-Varying Coefficients           β”‚
                    β”‚                                              β”‚
                    β”‚  Ξ²_TV(t), Ξ²_Digital(t), Ξ²_Social(t),         β”‚
                    β”‚  Ξ²_Print(t), Ξ²_Radio(t)  [all β‰₯ 0]          β”‚
                    β”‚  Ξ²_Seasonality(t), Ξ²_Trend(t),               β”‚
                    β”‚  Ξ²_CompetitorPrice(t)  [unconstrained]       β”‚
                    β”‚                                              β”‚
                    β”‚  β†’ Sales Decomposition:                      β”‚
                    β”‚    Sales_t = base + Ξ£ Ξ²_m(t)Β·Hill(Adstock(x))β”‚
                    β”‚            + Ξ£ Ξ²_c(t)Β·ctrl_c(t) + noise      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Kimodo β†’ MMM Mapping

Kimodo (Motion Generation) MMM-Diffusion (Marketing)
Text prompts Media spend, non-marketing vars, total sales
Motion/position constraints Sign constraints (Ξ²_media β‰₯ 0) + prior bounds
Root denoiser (trajectory) Campaign/Geo denoiser (aggregate patterns)
Body denoiser (joint angles) Channel denoiser (per-channel coefficients)
Skeleton positions/rotations Time-varying coefficients for decomposition
Foot contact constraints Media positivity constraint
Velocity loss Temporal smoothness loss

Key Design Decisions

Constraint Enforcement (3 mechanisms, belt-and-suspenders)

  1. Log-space reparametrization: Media coefficients are modeled in log-space during training. At decode time, exp() guarantees positivity. This is the primary mechanism.

  2. PhysDiff-style projection: During reverse diffusion sampling, every K=10 steps the denoised xΜ‚β‚€ is projected into the feasible region (clamped to valid ranges). Based on PhysDiff.

  3. Soft sign penalty: Training loss includes L_sign = ReLU(-Ξ²_media - threshold)Β² to discourage extreme negative values in log-space.

xβ‚€-prediction (not Ξ΅-prediction)

Following MDM and GMD, the model predicts the clean data xβ‚€ directly rather than the noise Ξ΅. This enables:

  • Constraint projection at each denoising step (operating on meaningful coefficient values)
  • Geometric auxiliary losses (sales reconstruction, temporal smoothness)

Dual-Denoiser Hierarchy

Stage 1 captures aggregate macro patterns (overall media effectiveness, seasonality), while Stage 2 specializes in per-channel coefficient dynamics conditioned on those patterns. This hierarchical decomposition mirrors the Kimodo root→body split.

Training Data

Synthetic MMM data generated with realistic patterns:

  • 5 media channels: TV, Digital, Social, Print, Radio
  • 3 control variables: Seasonality, Trend, Competitor Price
  • Adstock transformation: Geometric decay with Ξ± ~ Beta(2,2)
  • Hill saturation: With EC50 ~ LogNormal and slope ~ Uniform[0.5, 3]
  • Time-varying coefficients: Ornstein-Uhlenbeck random walk with mean reversion
  • 500 training scenarios, 104 weeks each

Losses

L_total = L_campaign + L_channel + 0.1Β·L_smooth + 0.01Β·L_sign

L_campaign = MSE(agg_pred, agg_target)       β€” Stage 1 xβ‚€-prediction
L_channel  = MSE(coeff_pred, coeff_target)    β€” Stage 2 xβ‚€-prediction  
L_smooth   = MSE(Ξ”coeff_pred, Ξ”coeff_target)  β€” Temporal smoothness (β‰ˆ velocity loss)
L_sign     = ReLU(-Ξ²_media_log - 5)           β€” Soft positivity

Results (PoC, CPU training, 30 epochs)

  • Final training loss: 0.129
  • Media positivity constraint: βœ… 100% satisfied (all generated media coefficients > 0)
  • Model size: 2.7M parameters
  • Generation time: ~2.6s per scenario (200 diffusion steps on CPU)

Usage

from mmm_diffusion import MMMDiffusionModel, MMMDataGenerator, MMMDiffusionDataset

# Generate synthetic data
gen = MMMDataGenerator(n_weeks=104, seed=42)
samples = gen.generate_dataset(100)

# Build model
model = MMMDiffusionModel(n_media=5, n_ctrl=3, T_diff=200)

# Train
dataset = MMMDiffusionDataset(samples, normalize=True)
# ... (see mmm_diffusion.py for full training loop)

# Generate coefficients for new conditioning data
conditioning = ...  # (1, T, 9) tensor: [media_spend, controls, total_sales]
coefficients = model.sample(conditioning, n_steps=200)
decoded = dataset.decode_coefficients(coefficients)
# decoded[:, :, :5] are GUARANTEED positive (media channels)

Files

  • mmm_diffusion.py β€” Full implementation (data generation, model, training, evaluation, visualization)
  • mmm_diffusion_model.pt β€” Trained model checkpoint (PoC, 30 epochs on CPU)
  • training_history.png β€” Training loss curves
  • coeff_comparison.png β€” True vs predicted coefficients on validation sample
  • sales_decomposition.png β€” Sales decomposition visualization

References

  • GMD (arxiv:2305.12577) β€” Two-stage trajectory + body diffusion (closest public analog to Kimodo)
  • MDM (arxiv:2209.14916) β€” Transformer denoiser, xβ‚€-prediction, geometric losses
  • PhysDiff (arxiv:2212.02500) β€” Physics-based constraint projection during denoising
  • PDM (arxiv:2402.03559) β€” Projected diffusion for hard constraint satisfaction
  • NNN (arxiv:2504.06212) β€” Neural network MMM architecture (Google)
  • TabDDPM (arxiv:2209.15421) β€” Diffusion models for tabular data

License

MIT