scroobio commited on
Commit
7897bd1
·
verified ·
1 Parent(s): bc4070d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - oceanography
6
+ - wave-forecasting
7
+ - time-series
8
+ - lightgbm
9
+ - regression
10
+ datasets:
11
+ - surfe-diem/wave-archive-USA-southwest
12
+ metrics:
13
+ - mae
14
+ library_name: lightgbm
15
+ ---
16
+
17
+ # Surfe Diem — Groundswell Direction (Sin Component) Forecast v1 (USA Southwest, 6h)
18
+
19
+ ## Model Description
20
+
21
+ A LightGBM regression model trained to predict **sin component of groundswell direction — part of a circular decomposition to eliminate the 0/360° discontinuity** 6 hours in advance using real-time buoy observations from NOAA's National Data Buoy Center (NDBC).
22
+
23
+ **Developed by:** Surfe Diem
24
+ **Model type:** Gradient Boosted Decision Trees (LightGBM)
25
+ **Language:** Python
26
+ **License:** MIT
27
+
28
+ ## Intended Use
29
+
30
+ ### Primary Use Case
31
+ Predict the sin component of groundswell direction. Pair with the `ground_dir_cos` model to reconstruct full direction in degrees. Forecast horizon: **6 hours**.
32
+
33
+ ### Out-of-Scope Use
34
+ - Horizons other than 6 hours (separate models exist for 6h, 12h, 24h, 48h)
35
+ - Wave height or period; must be paired with ground_dir_cos for meaningful direction output
36
+ - Regions outside the California coast (model trained on USA Southwest NDBC stations only)
37
+ - Real-time safety-critical applications without human oversight
38
+
39
+ ## Training Data
40
+
41
+ **Source:** [NOAA NDBC Buoy Spectral Wave Density Data](https://huggingface.co/datasets/surfe-diem/wave-archive-USA-southwest)
42
+
43
+ **Stations:** 15 NDBC buoys along the California coast
44
+ `46011, 46012, 46013, 46014, 46022, 46025, 46026, 46027, 46028, 46042, 46047, 46053, 46054, 46069, 46086`
45
+
46
+ **Records:** ~2.08M observations (259 Parquet files with stdmet and spectral aligned columns)
47
+
48
+ **Features:**
49
+ - Meteorological: wave height, period, direction, wind speed/direction, pressure, temperature
50
+ - **Spectral compression:** 9 physics-informed features derived from ~150 raw spectral bands
51
+ - Ground swell energy, direction, quality (< 0.08 Hz)
52
+ - Mid-range energy, direction, quality (0.08–0.12 Hz)
53
+ - Wind wave energy, direction, quality (> 0.12 Hz)
54
+ - Circular decomposition: sin/cos encoding for all direction columns
55
+ - Temporal lag features: 1h, 3h, 6h, 12h lags across all features
56
+
57
+ **Split:** 80/20 train/test, time-series ordered (no shuffle)
58
+
59
+ ## Model Performance
60
+
61
+ **Test MAE: 0.0879 unit circle [-1, 1]**
62
+
63
+ MAE is on the **unit circle [-1, 1]**. Combine with the cos model via `atan2(sin, cos)` to recover degrees.
64
+
65
+ Evaluated on held-out data with proper time-series validation (train on past, test on future).
66
+
67
+ ## Training Details
68
+
69
+ **Algorithm:** LightGBM
70
+ **Objective:** Regression (MAE / L1 loss)
71
+ **Learning rate:** 0.05
72
+ **Num leaves:** 31
73
+ **Feature fraction:** 0.9
74
+ **Bagging fraction:** 0.8
75
+ **Max iterations:** 2000 (early stopping, patience=50)
76
+
77
+ **Feature engineering:**
78
+ - Station IDs encoded as fixed `CategoricalDtype` for inference consistency
79
+ - Lag features filled with 0 for single-observation inference
80
+
81
+ ## How to Use
82
+
83
+ ```python
84
+ import lightgbm as lgb
85
+ import pandas as pd
86
+ import numpy as np
87
+ from huggingface_hub import hf_hub_download
88
+
89
+ # Load model
90
+ model_path = hf_hub_download(repo_id="surfe-diem/surfe-diem-v1-usa-southwest-ground-dir-sin-6h-model", filename="surfe_diem_v1_usa_southwest_ground_dir_sin_6h_model.txt")
91
+ model = lgb.Booster(model_file=model_path)
92
+
93
+ # Prepare observation with engineered features + lags + station_id
94
+ # See full inference pipeline in the GitHub repo
95
+ obs = pd.DataFrame({
96
+ 'wvht': [2.5], 'dpd': [12.0], 'apd': [8.5],
97
+ 'mwd': [270], 'wspd': [15.0], 'wdir': [280],
98
+ 'pres': [1013.0], 'atmp': [18.0], 'wtmp': [16.0],
99
+ # ... + spectral band features + lag features + station_id
100
+ })
101
+
102
+ prediction = model.predict(obs)[0] # unit circle [-1, 1]
103
+ ```
104
+
105
+ Full inference pipeline available in the [GitHub repo](https://github.com/crubio/surfe-diem-api).
106
+
107
+ ## Limitations
108
+
109
+ - **No history for single observations:** Lag features set to 0 for real-time single-row inference (slight accuracy degradation vs. buffered inference)
110
+ - **Regional specificity:** Trained only on California coast buoys
111
+ - **Forecast horizon:** 6 hours only — separate models cover 6h, 12h, 24h, 48h
112
+ - **Spectral dependency:** Full accuracy requires spectral band data; older buoy files without spectral data contribute only standard met features
113
+
114
+ ## Citation
115
+
116
+ ```bibtex
117
+ @misc{surfediem2026wave,
118
+ author = {Surfe Diem},
119
+ title = {Wave Forecasting Models v1 - USA Southwest},
120
+ year = {2026},
121
+ publisher = {Hugging Face},
122
+ howpublished = {\url{https://huggingface.co/surfe-diem}}
123
+ }
124
+ ```
125
+
126
+ ## Model Card Contact
127
+
128
+ For questions or issues, please open an issue in the [GitHub repository](https://github.com/crubio/surfe-diem-api).