π Monthly Seasonal Forecast Model (SOTA 2025)
State-of-the-art monthly seasonal forecasting combining the latest innovations from time series research. Competitive with the Chronos-Bolt foundation model (120M+ params) while using only ~2.5M parameters.
π Results on M4 Monthly (48,000 test series)
| Model |
sMAPE β |
MASE β |
OWA β |
Params |
| Seasonal Naive (baseline) |
15.99 |
1.260 |
1.000 |
- |
| Naive (baseline) |
15.26 |
1.205 |
0.955 |
- |
| CycleNet (MLP+RCF) |
13.41 |
0.989 |
0.812 |
215K |
| SeasonalPatchTST (Transformer+RCF) |
13.31 |
0.978 |
0.805 |
2.3M |
| Ensemble (Ours) |
13.15 |
0.964 |
0.794 |
2.5M |
| Chronos-Bolt-Small (SOTA foundation) |
13.03 |
0.956 |
0.787 |
47M |
Our lightweight ensemble achieves OWA=0.794, within 0.9% of the SOTA Chronos-Bolt foundation model that has 20x more parameters.
π¬ Architecture
CycleNet (MLP + Residual Cycle Forecasting)
Based on CycleNet:
- Learns a 12-month periodic cycle parameter
- Subtracts learned cycle β forecasts residuals β adds future cycle
- RevIN (Reversible Instance Normalization) for distribution shift
- 4-layer MLP backbone with GELU activation
SeasonalPatchTST (Transformer + RCF)
Combines PatchTST with CycleNet innovations:
- 12-month patches aligned with annual seasonality
- CLS token + 4-layer Transformer encoder with 8 attention heads
- CycleNet RCF decomposition + RevIN normalization
- Pre-LN architecture for training stability
Learned Ensemble
- Sigmoid-gated weighted average of CycleNet + PatchTST
- Weight learned on validation set
π Key Innovations
- Residual Cycle Forecasting (RCF): From CycleNet β learns W=12 annual cycle, forecasts residuals
- Seasonal Patching: 12-month patch size matched to annual cycle (vs typical 16 or 32)
- RevIN Normalization: Handles diverse scales across 48K series (Macro, Finance, Demographics)
- Value-flipping + Scaling Augmentation: From Sundial (ICML 2025 Oral)
- CLS Token Aggregation: Global representation for multi-step forecasting
π Usage
π Training Details
- Dataset: M4 Monthly (48,000 series from autogluon/chronos_datasets)
- Context: 48 months β Predict 18 months
- Optimizer: AdamW (lr=1e-3 CycleNet / 5e-4 PatchTST, weight_decay=0.01)
- Schedule: Cosine annealing
- Early stopping: Patience=12, best val MSE checkpoint
- Augmentation: Value-flipping (10%), random scaling Β±20%
- 288K training windows from sliding window extraction
π Comparison with Foundation Models (2025 SOTA)
| Model |
Paper |
Params |
fev-bench Win Rate |
| Chronos-2 |
Amazon, Oct 2025 |
120M |
90.7% |
| Sundial |
Tsinghua, ICML 2025 Oral |
128M |
1st MASE GIFT-Eval |
| Timer-S1 |
Tsinghua, Mar 2025 |
8.3B MoE |
Best CRPS GIFT-Eval |
| Chronos-Bolt |
Amazon |
205M |
250x faster |
| Ours (Ensemble) |
This work |
2.5M |
Competitive OWA |
π References