--- language: en license: mit tags: - oceanography - wave-forecasting - time-series - lightgbm - regression datasets: - surfe-diem/wave-archive-USA-southwest metrics: - mae library_name: lightgbm --- # Surfe Diem — Dominant Wave Period Forecast v1 (USA Southwest, 24h) ## Model Description A LightGBM regression model trained to predict **dominant wave period in seconds** 24 hours in advance using real-time buoy observations from NOAA's National Data Buoy Center (NDBC). **Developed by:** Surfe Diem **Model type:** Gradient Boosted Decision Trees (LightGBM) **Language:** Python **License:** MIT ## Intended Use ### Primary Use Case Predict dominant wave period (seconds) at a given forecast horizon for surf forecasting applications along the California coast. Forecast horizon: **24 hours**. ### Out-of-Scope Use - Horizons other than 24 hours (separate models exist for 6h, 12h, 24h, 48h) - Wave height or direction - Regions outside the California coast (model trained on USA Southwest NDBC stations only) - Real-time safety-critical applications without human oversight ## Training Data **Source:** [NOAA NDBC Buoy Spectral Wave Density Data](https://huggingface.co/datasets/surfe-diem/wave-archive-USA-southwest) **Stations:** 15 NDBC buoys along the California coast `46011, 46012, 46013, 46014, 46022, 46025, 46026, 46027, 46028, 46042, 46047, 46053, 46054, 46069, 46086` **Records:** ~2.08M observations (259 Parquet files with stdmet and spectral aligned columns) **Features:** - Meteorological: wave height, period, direction, wind speed/direction, pressure, temperature - **Spectral compression:** 9 physics-informed features derived from ~150 raw spectral bands - Ground swell energy, direction, quality (< 0.08 Hz) - Mid-range energy, direction, quality (0.08–0.12 Hz) - Wind wave energy, direction, quality (> 0.12 Hz) - Circular decomposition: sin/cos encoding for all direction columns - Temporal lag features: 1h, 3h, 6h, 12h lags across all features **Split:** 80/20 train/test, time-series ordered (no shuffle) ## Model Performance **Test MAE: 2.1645 seconds** MAE is in **seconds**. Dominant period typically ranges 5–20s. Evaluated on held-out data with proper time-series validation (train on past, test on future). ## Training Details **Algorithm:** LightGBM **Objective:** Regression (MAE / L1 loss) **Learning rate:** 0.05 **Num leaves:** 31 **Feature fraction:** 0.9 **Bagging fraction:** 0.8 **Max iterations:** 2000 (early stopping, patience=50) **Feature engineering:** - Station IDs encoded as fixed `CategoricalDtype` for inference consistency - Lag features filled with 0 for single-observation inference ## How to Use ```python import lightgbm as lgb import pandas as pd import numpy as np from huggingface_hub import hf_hub_download # Load model model_path = hf_hub_download(repo_id="surfe-diem/surfe-diem-v1-usa-southwest-dpd-24h-model", filename="surfe_diem_v1_usa_southwest_dpd_24h_model.txt") model = lgb.Booster(model_file=model_path) # Prepare observation with engineered features + lags + station_id # See full inference pipeline in the GitHub repo obs = pd.DataFrame({ 'wvht': [2.5], 'dpd': [12.0], 'apd': [8.5], 'mwd': [270], 'wspd': [15.0], 'wdir': [280], 'pres': [1013.0], 'atmp': [18.0], 'wtmp': [16.0], # ... + spectral band features + lag features + station_id }) prediction = model.predict(obs)[0] # seconds ``` Full inference pipeline available in the [GitHub repo](https://github.com/crubio/surfe-diem-api). ## Limitations - **No history for single observations:** Lag features set to 0 for real-time single-row inference (slight accuracy degradation vs. buffered inference) - **Regional specificity:** Trained only on California coast buoys - **Forecast horizon:** 24 hours only — separate models cover 6h, 12h, 24h, 48h - **Spectral dependency:** Full accuracy requires spectral band data; older buoy files without spectral data contribute only standard met features ## Citation ```bibtex @misc{surfediem2026wave, author = {Surfe Diem}, title = {Wave Forecasting Models v1 - USA Southwest}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/surfe-diem}} } ``` ## Model Card Contact For questions or issues, please open an issue in the [GitHub repository](https://github.com/crubio/surfe-diem-api).