TempoWAVE

Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting

Defu Cao^1*, Zijie Lei^1*,2, Muyan Weng¹, Jiao Sun^1,3, Yan Liu¹

¹University of Southern California · ²Meta · ³Google DeepMind

_{^* Equal contribution.}

IJCAI–ECAI 2026

This repository contains the model checkpoint for TempoWAVE, introduced in the paper Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting.

TempoWAVE gives an LLM a numerically grounded digit interface. Each decimal digit is routed through one of ten dedicated tokenizer tokens and initialized from a multi-wavelet, multi-scale codebook. Text, signs, decimal points, and separators continue to use the base model's standard embeddings.

TL;DR. The discrete, language-oriented token interface of LLMs is misaligned with continuous numerical values, which harms numerical ordering and forecasting reliability. TempoWAVE is a plug-and-play temporal wavelet digit interface that maps each scalar observation into digit-wise embeddings built from multi-wavelet, multi-scale coefficients. By directly overriding standard token representations, it exposes both fine-grained local fluctuations and macro global structure in a transformer-compatible form, achieving a new state of the art across five context-enriched forecasting benchmarks.

Overview of the TempoWAVE forecasting framework

Overview of the TempoWAVE-based forecasting framework. The input prompt is tokenized once with a tokenizer augmented with dedicated digit tokens. Text and context tokens use standard embeddings, while digit tokens are routed to the TempoWAVE module, which constructs digit embeddings via multi-wavelet, multi-scale coefficients and overrides the corresponding token embeddings. The resulting sequence is fed into an unchanged LLM backbone trained via supervised fine-tuning (SFT). Generated numeric tokens are parsed, de-normalized, and evaluated as real-valued forecasts.

Model Details


Base model	Qwen/Qwen2.5-1.5B-Instruct
Architecture	`Qwen2ForCausalLM` with a multi-wavelet digit-embedding interface
Wavelets / scales	Haar, db4, Mexican Hat · scales `1`, `2`, `4`
Task	Context-aware time series forecasting
Language	English context + numeric digit tokens

Paper Method Overview

For a fixed-precision value such as -0.5000, each digit is rendered as an individual token:

-<|digit_0|>.<|digit_5|><|digit_0|><|digit_0|><|digit_0|>

For each digit d in {0,...,9}, TempoWAVE:

Maps d to d / 9 on a fixed grid;
Samples each scaled mother wavelet at the digit's impulse location;
Concatenates coefficients across wavelets and scales;
Maps that vector to the LLM embedding dimension; and
Replaces only the corresponding digit-token embedding row.

The ten digit codewords are verified to be distinct. Because Qwen ties its input and output embeddings by default, TempoWAVE separates them before freezing the input codebook, while the language-model head remains trainable so it can generate the new digit tokens.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Melady/TempoWAVE"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

# Numeric values must be rendered as dedicated digit tokens, e.g. -0.5000 ->
#   -<|digit_0|>.<|digit_5|><|digit_0|><|digit_0|><|digit_0|>
# See the GitHub repository for prompt formatting, generation, parsing, and
# de-normalization helpers used to reproduce the paper's forecasts.

For the full forecasting pipeline—prompt construction, fixed-precision generation, digit-token parsing, de-normalization, and MAE/RMSE evaluation—see the GitHub repository.

Resources

Paper: Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting
GitHub Repository: DC-research/TempoWAVE

Citation

If you use TempoWAVE, please cite our paper:

@inproceedings{cao2026tempowave,
  title     = {Speaking Numbers to {LLM}s: Multi-Wavelet Number Embeddings for Time Series Forecasting},
  author    = {Cao, Defu and Lei, Zijie and Weng, Muyan and Sun, Jiao and Liu, Yan},
  booktitle = {Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-ECAI)},
  year      = {2026}
}