--- language: en license: mit tags: - oceanography - wave-forecasting - time-series - lightgbm - regression datasets: - surfe-diem/wave-archive-USA-southwest metrics: - mae library_name: lightgbm --- # Surfe Diem — Groundswell Direction (Sin Component) Forecast v1 (USA Southwest, 12h) ## Model Description A LightGBM regression model trained to predict **sin component of groundswell direction — part of a circular decomposition to eliminate the 0/360° discontinuity** 12 hours in advance using real-time buoy observations from NOAA's National Data Buoy Center (NDBC). **Developed by:** Surfe Diem **Model type:** Gradient Boosted Decision Trees (LightGBM) **Language:** Python **License:** MIT ## Intended Use ### Primary Use Case Predict the sin component of groundswell direction. Pair with the `ground_dir_cos` model to reconstruct full direction in degrees. Forecast horizon: **12 hours**. ### Out-of-Scope Use - Horizons other than 12 hours (separate models exist for 6h, 12h, 24h, 48h) - Wave height or period; must be paired with ground_dir_cos for meaningful direction output - Regions outside the California coast (model trained on USA Southwest NDBC stations only) - Real-time safety-critical applications without human oversight ## Training Data **Source:** [NOAA NDBC Buoy Spectral Wave Density Data](https://huggingface.co/datasets/surfe-diem/wave-archive-USA-southwest) **Stations:** 15 NDBC buoys along the California coast `46011, 46012, 46013, 46014, 46022, 46025, 46026, 46027, 46028, 46042, 46047, 46053, 46054, 46069, 46086` **Records:** ~2.08M observations (259 Parquet files with stdmet and spectral aligned columns) **Features:** - Meteorological: wave height, period, direction, wind speed/direction, pressure, temperature - **Spectral compression:** 9 physics-informed features derived from ~150 raw spectral bands - Ground swell energy, direction, quality (< 0.08 Hz) - Mid-range energy, direction, quality (0.08–0.12 Hz) - Wind wave energy, direction, quality (> 0.12 Hz) - Circular decomposition: sin/cos encoding for all direction columns - Temporal lag features: 1h, 3h, 6h, 12h lags across all features **Split:** 80/20 train/test, time-series ordered (no shuffle) ## Model Performance **Test MAE: 0.0946 unit circle [-1, 1]** MAE is on the **unit circle [-1, 1]**. Combine with the cos model via `atan2(sin, cos)` to recover degrees. Evaluated on held-out data with proper time-series validation (train on past, test on future). ## Training Details **Algorithm:** LightGBM **Objective:** Regression (MAE / L1 loss) **Learning rate:** 0.05 **Num leaves:** 31 **Feature fraction:** 0.9 **Bagging fraction:** 0.8 **Max iterations:** 2000 (early stopping, patience=50) **Feature engineering:** - Station IDs encoded as fixed `CategoricalDtype` for inference consistency - Lag features filled with 0 for single-observation inference ## How to Use ```python import lightgbm as lgb import pandas as pd import numpy as np from huggingface_hub import hf_hub_download # Load model model_path = hf_hub_download(repo_id="surfe-diem/surfe-diem-v1-usa-southwest-ground-dir-sin-12h-model", filename="surfe_diem_v1_usa_southwest_ground_dir_sin_12h_model.txt") model = lgb.Booster(model_file=model_path) # Prepare observation with engineered features + lags + station_id # See full inference pipeline in the GitHub repo obs = pd.DataFrame({ 'wvht': [2.5], 'dpd': [12.0], 'apd': [8.5], 'mwd': [270], 'wspd': [15.0], 'wdir': [280], 'pres': [1013.0], 'atmp': [18.0], 'wtmp': [16.0], # ... + spectral band features + lag features + station_id }) prediction = model.predict(obs)[0] # unit circle [-1, 1] ``` Full inference pipeline available in the [GitHub repo](https://github.com/crubio/surfe-diem-api). ## Limitations - **No history for single observations:** Lag features set to 0 for real-time single-row inference (slight accuracy degradation vs. buffered inference) - **Regional specificity:** Trained only on California coast buoys - **Forecast horizon:** 12 hours only — separate models cover 6h, 12h, 24h, 48h - **Spectral dependency:** Full accuracy requires spectral band data; older buoy files without spectral data contribute only standard met features ## Citation ```bibtex @misc{surfediem2026wave, author = {Surfe Diem}, title = {Wave Forecasting Models v1 - USA Southwest}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/surfe-diem}} } ``` ## Model Card Contact For questions or issues, please open an issue in the [GitHub repository](https://github.com/crubio/surfe-diem-api).