# MMM-Diffusion: Marketing Mix Modeling via Dual-Denoiser Diffusion A generative diffusion model for **Marketing Mix Modeling (MMM)** that predicts time-varying coefficients for sales decomposition. Adapted from NVIDIA's Kimodo/GMD dual-denoiser architecture. ## Architecture ``` ┌─────────────────────────────────────────────┐ │ MMM-Diffusion Architecture │ │ (Adapted from Kimodo/GMD Dual-Denoiser) │ └─────────────────────────────────────────────┘ ┌──────────────────┐ ┌──────────────────────────────────────────┐ │ CONDITIONING │ │ STAGE 1: Campaign/Geo Denoiser │ │ │ │ (≈ Kimodo Root Denoiser) │ │ • Media Spend │──▶│ │ │ (5 channels) │ │ Denoises aggregate-level patterns │ │ • Controls │ │ from non-marketing vars + total sales │ │ (3 variables) │ │ │ │ • Total Sales │ │ Transformer Encoder (4 layers, d=128) │ │ │ └──────────────┬───────────────────────────┘ └──────────────────┘ │ Campaign Context ▼ ┌──────────────────────────────────────────────┐ │ STAGE 2: Channel Denoiser │ │ (≈ Kimodo Body Denoiser) │ │ │ │ Denoises per-channel time-varying β_t │ │ conditioned on Stage 1 output + media spend │ │ │ │ Cross-Attention + Transformer (6 layers) │ │ │ │ CONSTRAINT ENFORCEMENT: │ │ • Log-space for media (exp → always ≥ 0) │ │ • PhysDiff-style projection every K steps │ │ • Soft sign penalty loss │ └──────────────┬───────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────┐ │ OUTPUT: Time-Varying Coefficients │ │ │ │ β_TV(t), β_Digital(t), β_Social(t), │ │ β_Print(t), β_Radio(t) [all ≥ 0] │ │ β_Seasonality(t), β_Trend(t), │ │ β_CompetitorPrice(t) [unconstrained] │ │ │ │ → Sales Decomposition: │ │ Sales_t = base + Σ β_m(t)·Hill(Adstock(x))│ │ + Σ β_c(t)·ctrl_c(t) + noise │ └──────────────────────────────────────────────┘ ``` ## Kimodo → MMM Mapping | Kimodo (Motion Generation) | MMM-Diffusion (Marketing) | |-------------------------------|------------------------------------------------| | Text prompts | Media spend, non-marketing vars, total sales | | Motion/position constraints | Sign constraints (β_media ≥ 0) + prior bounds | | Root denoiser (trajectory) | Campaign/Geo denoiser (aggregate patterns) | | Body denoiser (joint angles) | Channel denoiser (per-channel coefficients) | | Skeleton positions/rotations | Time-varying coefficients for decomposition | | Foot contact constraints | Media positivity constraint | | Velocity loss | Temporal smoothness loss | ## Key Design Decisions ### Constraint Enforcement (3 mechanisms, belt-and-suspenders) 1. **Log-space reparametrization**: Media coefficients are modeled in log-space during training. At decode time, `exp()` guarantees positivity. This is the primary mechanism. 2. **PhysDiff-style projection**: During reverse diffusion sampling, every K=10 steps the denoised x̂₀ is projected into the feasible region (clamped to valid ranges). Based on [PhysDiff](https://arxiv.org/abs/2212.02500). 3. **Soft sign penalty**: Training loss includes `L_sign = ReLU(-β_media - threshold)²` to discourage extreme negative values in log-space. ### x₀-prediction (not ε-prediction) Following MDM and GMD, the model predicts the clean data x₀ directly rather than the noise ε. This enables: - Constraint projection at each denoising step (operating on meaningful coefficient values) - Geometric auxiliary losses (sales reconstruction, temporal smoothness) ### Dual-Denoiser Hierarchy Stage 1 captures **aggregate macro patterns** (overall media effectiveness, seasonality), while Stage 2 specializes in **per-channel coefficient dynamics** conditioned on those patterns. This hierarchical decomposition mirrors the Kimodo root→body split. ## Training Data Synthetic MMM data generated with realistic patterns: - **5 media channels**: TV, Digital, Social, Print, Radio - **3 control variables**: Seasonality, Trend, Competitor Price - **Adstock transformation**: Geometric decay with α ~ Beta(2,2) - **Hill saturation**: With EC50 ~ LogNormal and slope ~ Uniform[0.5, 3] - **Time-varying coefficients**: Ornstein-Uhlenbeck random walk with mean reversion - **500 training scenarios**, 104 weeks each ## Losses ``` L_total = L_campaign + L_channel + 0.1·L_smooth + 0.01·L_sign L_campaign = MSE(agg_pred, agg_target) — Stage 1 x₀-prediction L_channel = MSE(coeff_pred, coeff_target) — Stage 2 x₀-prediction L_smooth = MSE(Δcoeff_pred, Δcoeff_target) — Temporal smoothness (≈ velocity loss) L_sign = ReLU(-β_media_log - 5) — Soft positivity ``` ## Results (PoC, CPU training, 30 epochs) - **Final training loss**: 0.129 - **Media positivity constraint**: ✅ 100% satisfied (all generated media coefficients > 0) - **Model size**: 2.7M parameters - **Generation time**: ~2.6s per scenario (200 diffusion steps on CPU) ## Usage ```python from mmm_diffusion import MMMDiffusionModel, MMMDataGenerator, MMMDiffusionDataset # Generate synthetic data gen = MMMDataGenerator(n_weeks=104, seed=42) samples = gen.generate_dataset(100) # Build model model = MMMDiffusionModel(n_media=5, n_ctrl=3, T_diff=200) # Train dataset = MMMDiffusionDataset(samples, normalize=True) # ... (see mmm_diffusion.py for full training loop) # Generate coefficients for new conditioning data conditioning = ... # (1, T, 9) tensor: [media_spend, controls, total_sales] coefficients = model.sample(conditioning, n_steps=200) decoded = dataset.decode_coefficients(coefficients) # decoded[:, :, :5] are GUARANTEED positive (media channels) ``` ## Files - `mmm_diffusion.py` — Full implementation (data generation, model, training, evaluation, visualization) - `mmm_diffusion_model.pt` — Trained model checkpoint (PoC, 30 epochs on CPU) - `training_history.png` — Training loss curves - `coeff_comparison.png` — True vs predicted coefficients on validation sample - `sales_decomposition.png` — Sales decomposition visualization ## References - **GMD** (arxiv:2305.12577) — Two-stage trajectory + body diffusion (closest public analog to Kimodo) - **MDM** (arxiv:2209.14916) — Transformer denoiser, x₀-prediction, geometric losses - **PhysDiff** (arxiv:2212.02500) — Physics-based constraint projection during denoising - **PDM** (arxiv:2402.03559) — Projected diffusion for hard constraint satisfaction - **NNN** (arxiv:2504.06212) — Neural network MMM architecture (Google) - **TabDDPM** (arxiv:2209.15421) — Diffusion models for tabular data ## License MIT