
LSTMs vs Climate: Forecasting World Temperatures
Hello everyone!
For my research project this semester I've been building a time-series model to forecast global surface temperatures. The motivation was simple — can a relatively small deep-learning model learn enough of the structure in temperature records to give useful year-ahead forecasts? And while I was at it, can I convince myself that LSTMs still deserve a seat at the table now that everyone uses transformers for everything?
The data: I pulled historical temperature records going back to 1880 from the public NASA GISTEMP dataset, plus some complementary monthly anomaly data. After normalization, each training example is a sliding window of the previous N months predicting the next k months.
The baseline: the simplest possible baseline is "tomorrow will be like today." For monthly temperature anomalies on a global scale, that baseline is really hard to beat. Any model you train has to clearly improve on persistence, otherwise you're just fitting noise.
The model: I went with a small 2-layer LSTM, hidden size 128, followed by a linear projection to the forecast horizon. Training target was MSE on the anomaly values, with a lookback window of 48 months and a forecast horizon of 12.
I also tried a transformer baseline — a small encoder-only model with learned positional embeddings and a similar parameter count. On this dataset, the LSTM beat it consistently, which surprised me at first. My working theory is that when your training set is small (roughly 140 years of monthly data, ~1,700 samples after windowing), the LSTM's strong recurrence prior acts as useful inductive bias. The transformer wants more data to figure out the temporal structure from scratch.
What actually worked:
- Differencing the input so the model predicts month-over-month change instead of absolute value. This alone cut RMSE by about 20%.
- Z-score normalization per month, not globally — summer months have different variance than winter months.
- A small dropout between LSTM layers (0.2). The model was overfitting within 5 epochs otherwise.
What didn't work:
- Adding seasonal features (sin/cos of month-of-year) — the LSTM already learned them.
- Increasing hidden size past 128 — more overfitting, no better validation loss.
- Attention on top of the LSTM — marginal gains, lots of extra complexity.
Result: the model beat persistence by about 35% on RMSE over a held-out 2015–2023 window. Not groundbreaking, but real — and it reinforced for me that "small recurrent model" is still a sensible starting point for time-series problems where you don't have millions of examples.
I'm writing up a short paper summary for my course and will link it here once it's cleaned up. If anyone's working on climate forecasting at a bigger scale, I'd love to talk — especially about how to handle the uneven sampling in the older records.
See you in the next post!