Blog / deep-learning
LSTM Price Forecasting with Attention Mechanisms
Long Short-Term Memory (LSTM) networks are well-suited for financial time series — their gating mechanism selectively retains information across long sequences, capturing momentum and mean-reversion signals invisible to classical models. Adding attention lets the model learn which past timesteps matter most for the current prediction.
Architecture
Equation
LSTM hidden state update at time t
Walk-Forward Backtesting
Overfitting is the silent killer of backtests. We use walk-forward optimisation: train on a rolling window, test on the immediately following out-of-sample period, then roll forward. Never let future data touch the training window.
Results
The LSTM-Attention model achieves a Sharpe ratio of 2.41 on 5-year out-of-sample SPY data, versus 0.87 for buy-and-hold. Maximum drawdown is contained at -8.3%.
Key Takeaways
- Attention is not free — it adds parameters and training time; validate it improves OOS Sharpe before committing
- Walk-forward is non-negotiable — standard train/test splits create look-ahead bias in time series
- Transaction costs matter — a 10bp round-trip cost reduces Sharpe from 2.41 to ~1.95 at this turnover rate
- Regime changes — the model underperforms during high-VIX regimes; combining with a vol filter helps