Instructions to use Melady/TempoWAVE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Melady/TempoWAVE with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Melady/TempoWAVE") model = AutoModelForCausalLM.from_pretrained("Melady/TempoWAVE") - Notebooks
- Google Colab
- Kaggle
TempoWAVE
Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting
Defu Cao1*, Zijie Lei1*,2, Muyan Weng1, Jiao Sun1,3, Yan Liu1
1University of Southern California · 2Meta · 3Google DeepMind
* Equal contribution.
IJCAI–ECAI 2026
This repository contains the model checkpoint for TempoWAVE, introduced in the paper Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting.
TempoWAVE gives an LLM a numerically grounded digit interface. Each decimal digit is routed through one of ten dedicated tokenizer tokens and initialized from a multi-wavelet, multi-scale codebook. Text, signs, decimal points, and separators continue to use the base model's standard embeddings.
TL;DR. The discrete, language-oriented token interface of LLMs is misaligned with continuous numerical values, which harms numerical ordering and forecasting reliability. TempoWAVE is a plug-and-play temporal wavelet digit interface that maps each scalar observation into digit-wise embeddings built from multi-wavelet, multi-scale coefficients. By directly overriding standard token representations, it exposes both fine-grained local fluctuations and macro global structure in a transformer-compatible form, achieving a new state of the art across five context-enriched forecasting benchmarks.
Overview of the TempoWAVE-based forecasting framework. The input prompt is tokenized once with a tokenizer augmented with dedicated digit tokens. Text and context tokens use standard embeddings, while digit tokens are routed to the TempoWAVE module, which constructs digit embeddings via multi-wavelet, multi-scale coefficients and overrides the corresponding token embeddings. The resulting sequence is fed into an unchanged LLM backbone trained via supervised fine-tuning (SFT). Generated numeric tokens are parsed, de-normalized, and evaluated as real-valued forecasts.
Model Details
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Architecture | Qwen2ForCausalLM with a multi-wavelet digit-embedding interface |
| Wavelets / scales | Haar, db4, Mexican Hat · scales 1, 2, 4 |
| Task | Context-aware time series forecasting |
| Language | English context + numeric digit tokens |
Paper Method Overview
For a fixed-precision value such as -0.5000, each digit is rendered as an individual token:
-<|digit_0|>.<|digit_5|><|digit_0|><|digit_0|><|digit_0|>
For each digit d in {0,...,9}, TempoWAVE:
- Maps
dtod / 9on a fixed grid; - Samples each scaled mother wavelet at the digit's impulse location;
- Concatenates coefficients across wavelets and scales;
- Maps that vector to the LLM embedding dimension; and
- Replaces only the corresponding digit-token embedding row.
The ten digit codewords are verified to be distinct. Because Qwen ties its input and output embeddings by default, TempoWAVE separates them before freezing the input codebook, while the language-model head remains trainable so it can generate the new digit tokens.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Melady/TempoWAVE"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
# Numeric values must be rendered as dedicated digit tokens, e.g. -0.5000 ->
# -<|digit_0|>.<|digit_5|><|digit_0|><|digit_0|><|digit_0|>
# See the GitHub repository for prompt formatting, generation, parsing, and
# de-normalization helpers used to reproduce the paper's forecasts.
For the full forecasting pipeline—prompt construction, fixed-precision generation, digit-token parsing, de-normalization, and MAE/RMSE evaluation—see the GitHub repository.
Resources
- Paper: Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting
- GitHub Repository: DC-research/TempoWAVE
Citation
If you use TempoWAVE, please cite our paper:
@inproceedings{cao2026tempowave,
title = {Speaking Numbers to {LLM}s: Multi-Wavelet Number Embeddings for Time Series Forecasting},
author = {Cao, Defu and Lei, Zijie and Weng, Muyan and Sun, Jiao and Liu, Yan},
booktitle = {Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-ECAI)},
year = {2026}
}
- Downloads last month
- 26