.. _backtesting_performance: Backtesting Performance (Speed + Parity) ======================================== This page explains how to make backtests faster **without changing strategy correctness**. Performance issues are usually dominated by one of: - **Startup** (python import time, environment loading, first progress update) - **Data hydration** (first run downloads data; warm runs should reuse cache) - **Compute** (strategy logic, pandas transforms, option pricing) - **Artifacts** (tearsheets, plots, indicators) If you are new to backtesting, start with :doc:`backtesting.how_to_backtest`. Warm vs cold runs ----------------- Backtest speed depends heavily on caching: - **Cold run:** the cache is empty, so the backtest must fetch historical data. - **Warm run:** the cache already contains required data, so the backtest should be dramatically faster. If a “warm” run is still slow, the most common causes are: - your cache backend is not configured (or not writable) - your cache namespace changed between runs - a request type is not being cached (so it keeps downloading) For deeper cache semantics (engineering notes), see ``docs/remote_cache.md`` in the repository. Quick diagnosis checklist ------------------------- 1. **Is the backtest downloading a lot of data?** - Look for many “Submitted to queue” log lines (ThetaData) or repeated API calls (Polygon). - If yes, you are hydration-bound: fix request fanout or cache coverage first. 2. **Is the backtest slow even with near-zero downloads?** - If yes, you are compute/IO/artifact-bound: use profiling to attribute time. 3. **Does the backtest look stuck in the UI?** - If data is downloading, progress may not advance unless a heartbeat is enabled. Profiling (YAPPI) ----------------- To attribute where time is spent (S3 IO vs compute vs artifacts), enable profiling: - Set ``BACKTESTING_PROFILE=yappi`` - Run the backtest - Inspect the produced ``*_profile_yappi.csv`` artifact Common hotspots to look for: - S3 IO (many small objects can be slow even on “warm” runs) - pandas transforms (merge/concat/tz conversions) - artifact generation (tearsheet, indicators, plots) Environment variables --------------------- Many backtesting behaviors are configurable via environment variables. See: - :doc:`environment_variables` (public docs) - ``docs/ENV_VARS.md`` (engineering notes; may include contributor-specific details) Common performance-related flags: - ``LUMIBOT_DISABLE_DOTENV``: disables recursive ``.env`` discovery (reduces startup latency and avoids accidental config overrides) - ``SHOW_TEARSHEET`` / ``SHOW_PLOT`` / ``SHOW_INDICATORS``: disables heavy artifact generation when you only need core results - ``BACKTESTING_PROFILE``: enable profiling (yappi) ThetaData options: common performance pitfalls ---------------------------------------------- Options backtests can be slower than stock backtests because they may need: - option chains (expirations/strikes) - quote history (bid/ask) for realistic pricing - additional mark-to-market logic for illiquid contracts The fastest options backtests are those that: - build **only the chain data they need** (one expiry and a narrow strike neighborhood) - avoid probing hundreds/thousands of strikes when searching for a delta/ATM contract - reuse cached quote history instead of requesting tiny windows repeatedly For ThetaData details, see :doc:`backtesting.thetadata`.