July 2026

Reaching the frontier of AI forecasting with reinforcement learning

Abstract

Top AI forecasting systems perform similarly to skilled humans using off-the-shelf LLMs that aren't necessarily trained for the task. We ask whether this recipe can be improved by fine-tuning models specifically for judgmental forecasting. To answer this question, we fine-tune gpt-oss-120b with reinforcement learning on roughly 10,000 binary forecasting questions, rewarding probabilities assigned to realized outcomes. We find that fine-tuning elevates gpt-oss-120b from below frontier LLM performance to marginally above them on held-out Metaculus AI Benchmark questions.

Authors

Scott Jeen, Matthew Aitchison, Maximilian Anthony Hugh Clark, Toby Shevlane, Ben Day

Venue

ICML 2026 Workshop on Forecasting as a New Frontier of Intelligence