Text Generation
PEFT
Safetensors
English
forecasting
prediction
reinforcement-learning
grpo
lora
mixture-of-experts
politics
trump
future-as-label
Eval Results (legacy)
Instructions to use LightningRodLabs/Trump-Forecaster with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LightningRodLabs/Trump-Forecaster with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b") model = PeftModel.from_pretrained(base_model, "LightningRodLabs/Trump-Forecaster") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: peft | |
| tags: | |
| - forecasting | |
| - prediction | |
| - reinforcement-learning | |
| - grpo | |
| - lora | |
| - mixture-of-experts | |
| - politics | |
| - trump | |
| - future-as-label | |
| datasets: | |
| - LightningRodLabs/WWTD-2025 | |
| base_model: openai/gpt-oss-120b | |
| pipeline_tag: text-generation | |
| model-index: | |
| - name: Trump-Forecaster | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Probabilistic Forecasting | |
| dataset: | |
| name: WWTD-2025 | |
| type: LightningRodLabs/WWTD-2025 | |
| split: test | |
| metrics: | |
| - type: brier_score | |
| value: 0.194 | |
| name: Brier Score | |
| - type: ece | |
| value: 0.079 | |
| name: Expected Calibration Error | |
| # Trump-Forecaster | |
| ### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions | |
| Starting from nothing but 5 search queries, we used the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) to automatically generate [2,108 forecasting questions](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) from news articles, label them using real outcomes, and train this model via RL. **No expertise required. No manual labeling. No domain-specific engineering.** The result beats GPT-5 on held-out questions. | |
| You can do this in any domain — just change the search queries. See [how we built the dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025). | |
| This repo contains a **LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b). A standalone `merge.py` script is included to merge it into a full model. | |
| --- | |
| ## Results | |
| Evaluated on 682 held-out test questions under two conditions: with news context, and without context (question only). The no-context condition reveals whether the model knows what it doesn't know—untrained models project false confidence, while RL training fixes overconfidence. | |
| | Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) | | |
| |-------|:---:|:---:|:---:|:---:|:---:|:---:| | |
| | GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 | | |
| | gpt-oss-120b (base) | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 | | |
| | **Trump-Forecaster** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** | | |
|  | |
|  | |
|  | |
| ### Metrics | |
| - **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate—positive means the model learned something useful beyond historical frequency. | |
| - **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better. | |
| --- | |
| ## Training | |
| - **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4) | |
| - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer) | |
| - **LoRA rank**: 32 | |
| - **Learning rate**: 4e-5 | |
| - **Batch size**: 32, group size 8 | |
| - **Training steps**: 50 | |
| - **Max tokens**: 16,384 | |
| --- | |
| ## Usage | |
| This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included. | |
| ### Merge into full model | |
| ```bash | |
| pip install torch transformers safetensors tqdm huggingface-hub | |
| python merge.py --output ./trump-forecaster-merged | |
| ``` | |
| This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model. | |
| ### Inference | |
| ```python | |
| import sglang as sgl | |
| engine = sgl.Engine( | |
| model_path="./trump-forecaster-merged", | |
| tokenizer_path="openai/gpt-oss-120b", | |
| trust_remote_code=True, | |
| dtype="bfloat16", | |
| tp_size=2, | |
| ) | |
| news_context = "... relevant news articles ..." | |
| prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes". | |
| Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025? | |
| Context: | |
| {news_context} | |
| Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags.""" | |
| output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]}) | |
| print(output["text"]) | |
| ``` | |
| --- | |
| ## Links | |
| - **Dataset**: [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) | |
| - **Training platform**: [Tinker](https://tinker.computer) | |
| - **Data generation**: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) | |
| - **Future-as-Label paper**: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336) | |
| - **Outcome-based RL paper**: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989) | |