Backtesting Performance (Speed + Parity)

This page explains how to make backtests faster without changing strategy correctness. Performance issues are usually dominated by one of:

  • Startup (python import time, environment loading, first progress update)

  • Data hydration (first run downloads data; warm runs should reuse cache)

  • Compute (strategy logic, pandas transforms, option pricing)

  • Artifacts (tearsheets, plots, indicators)

If you are new to backtesting, start with How To Backtest.

Warm vs cold runs

Backtest speed depends heavily on caching:

  • Cold run: the cache is empty, so the backtest must fetch historical data.

  • Warm run: the cache already contains required data, so the backtest should be dramatically faster.

If a “warm” run is still slow, the most common causes are:

  • your cache backend is not configured (or not writable)

  • your cache namespace changed between runs

  • a request type is not being cached (so it keeps downloading)

For deeper cache semantics (engineering notes), see docs/remote_cache.md in the repository.

Quick diagnosis checklist

  1. Is the backtest downloading a lot of data?

    • Look for many “Submitted to queue” log lines (ThetaData) or repeated API calls (Polygon).

    • If yes, you are hydration-bound: fix request fanout or cache coverage first.

  2. Is the backtest slow even with near-zero downloads?

    • If yes, you are compute/IO/artifact-bound: use profiling to attribute time.

  3. Does the backtest look stuck in the UI?

    • If data is downloading, progress may not advance unless a heartbeat is enabled.

Profiling (YAPPI)

To attribute where time is spent (S3 IO vs compute vs artifacts), enable profiling:

  • Set BACKTESTING_PROFILE=yappi

  • Run the backtest

  • Inspect the produced *_profile_yappi.csv artifact

Common hotspots to look for:

  • S3 IO (many small objects can be slow even on “warm” runs)

  • pandas transforms (merge/concat/tz conversions)

  • artifact generation (tearsheet, indicators, plots)

Environment variables

Many backtesting behaviors are configurable via environment variables. See:

  • Environment Variables (public docs)

  • docs/ENV_VARS.md (engineering notes; may include contributor-specific details)

Common performance-related flags:

  • LUMIBOT_DISABLE_DOTENV: disables recursive .env discovery (reduces startup latency and avoids accidental config overrides)

  • SHOW_TEARSHEET / SHOW_PLOT / SHOW_INDICATORS: disables heavy artifact generation when you only need core results

  • BACKTESTING_PROFILE: enable profiling (yappi)

ThetaData options: common performance pitfalls

Options backtests can be slower than stock backtests because they may need:

  • option chains (expirations/strikes)

  • quote history (bid/ask) for realistic pricing

  • additional mark-to-market logic for illiquid contracts

The fastest options backtests are those that:

  • build only the chain data they need (one expiry and a narrow strike neighborhood)

  • avoid probing hundreds/thousands of strikes when searching for a delta/ATM contract

  • reuse cached quote history instead of requesting tiny windows repeatedly

For ThetaData details, see ThetaData Backtesting.