| --- |
| language: en |
| license: mit |
| tags: |
| - oceanography |
| - wave-forecasting |
| - time-series |
| - lightgbm |
| - regression |
| datasets: |
| - surfe-diem/wave-archive-USA-southwest |
| metrics: |
| - mae |
| library_name: lightgbm |
| --- |
| |
| # Surfe Diem — Groundswell Direction (Cos Component) Forecast v1 (USA Southwest, 24h) |
|
|
| ## Model Description |
|
|
| A LightGBM regression model trained to predict **cos component of groundswell direction — part of a circular decomposition to eliminate the 0/360° discontinuity** 24 hours in advance using real-time buoy observations from NOAA's National Data Buoy Center (NDBC). |
|
|
| **Developed by:** Surfe Diem |
| **Model type:** Gradient Boosted Decision Trees (LightGBM) |
| **Language:** Python |
| **License:** MIT |
|
|
| ## Intended Use |
|
|
| ### Primary Use Case |
| Predict the cos component of groundswell direction. Pair with the `ground_dir_sin` model to reconstruct full direction in degrees. Forecast horizon: **24 hours**. |
|
|
| ### Out-of-Scope Use |
| - Horizons other than 24 hours (separate models exist for 6h, 12h, 24h, 48h) |
| - Wave height or period; must be paired with ground_dir_sin for meaningful direction output |
| - Regions outside the California coast (model trained on USA Southwest NDBC stations only) |
| - Real-time safety-critical applications without human oversight |
|
|
| ## Training Data |
|
|
| **Source:** [NOAA NDBC Buoy Spectral Wave Density Data](https://huggingface.co/datasets/surfe-diem/wave-archive-USA-southwest) |
|
|
| **Stations:** 15 NDBC buoys along the California coast |
| `46011, 46012, 46013, 46014, 46022, 46025, 46026, 46027, 46028, 46042, 46047, 46053, 46054, 46069, 46086` |
|
|
| **Records:** ~2.08M observations (259 Parquet files with stdmet and spectral aligned columns) |
|
|
| **Features:** |
| - Meteorological: wave height, period, direction, wind speed/direction, pressure, temperature |
| - **Spectral compression:** 9 physics-informed features derived from ~150 raw spectral bands |
| - Ground swell energy, direction, quality (< 0.08 Hz) |
| - Mid-range energy, direction, quality (0.08–0.12 Hz) |
| - Wind wave energy, direction, quality (> 0.12 Hz) |
| - Circular decomposition: sin/cos encoding for all direction columns |
| - Temporal lag features: 1h, 3h, 6h, 12h lags across all features |
|
|
| **Split:** 80/20 train/test, time-series ordered (no shuffle) |
|
|
| ## Model Performance |
|
|
| **Test MAE: 0.1564 unit circle [-1, 1]** |
|
|
| MAE is on the **unit circle [-1, 1]**. Combine with the sin model via `atan2(sin, cos)` to recover degrees. |
|
|
| Evaluated on held-out data with proper time-series validation (train on past, test on future). |
|
|
| ## Training Details |
|
|
| **Algorithm:** LightGBM |
| **Objective:** Regression (MAE / L1 loss) |
| **Learning rate:** 0.05 |
| **Num leaves:** 31 |
| **Feature fraction:** 0.9 |
| **Bagging fraction:** 0.8 |
| **Max iterations:** 2000 (early stopping, patience=50) |
|
|
| **Feature engineering:** |
| - Station IDs encoded as fixed `CategoricalDtype` for inference consistency |
| - Lag features filled with 0 for single-observation inference |
|
|
| ## How to Use |
|
|
| ```python |
| import lightgbm as lgb |
| import pandas as pd |
| import numpy as np |
| from huggingface_hub import hf_hub_download |
| |
| # Load model |
| model_path = hf_hub_download(repo_id="surfe-diem/surfe-diem-v1-usa-southwest-ground-dir-cos-24h-model", filename="surfe_diem_v1_usa_southwest_ground_dir_cos_24h_model.txt") |
| model = lgb.Booster(model_file=model_path) |
| |
| # Prepare observation with engineered features + lags + station_id |
| # See full inference pipeline in the GitHub repo |
| obs = pd.DataFrame({ |
| 'wvht': [2.5], 'dpd': [12.0], 'apd': [8.5], |
| 'mwd': [270], 'wspd': [15.0], 'wdir': [280], |
| 'pres': [1013.0], 'atmp': [18.0], 'wtmp': [16.0], |
| # ... + spectral band features + lag features + station_id |
| }) |
| |
| prediction = model.predict(obs)[0] # unit circle [-1, 1] |
| ``` |
|
|
| Full inference pipeline available in the [GitHub repo](https://github.com/crubio/surfe-diem-api). |
|
|
| ## Limitations |
|
|
| - **No history for single observations:** Lag features set to 0 for real-time single-row inference (slight accuracy degradation vs. buffered inference) |
| - **Regional specificity:** Trained only on California coast buoys |
| - **Forecast horizon:** 24 hours only — separate models cover 6h, 12h, 24h, 48h |
| - **Spectral dependency:** Full accuracy requires spectral band data; older buoy files without spectral data contribute only standard met features |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{surfediem2026wave, |
| author = {Surfe Diem}, |
| title = {Wave Forecasting Models v1 - USA Southwest}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| howpublished = {\url{https://huggingface.co/surfe-diem}} |
| } |
| ``` |
|
|
| ## Model Card Contact |
|
|
| For questions or issues, please open an issue in the [GitHub repository](https://github.com/crubio/surfe-diem-api). |
|
|