Pandas (CSV or other data)#
NOTE: Please ensure you have installed the latest lumibot version using ``pip install lumibot –upgrade`` before proceeding as there have been some major changes to the backtesting module in the latest version.
For most situations, you will want to use the Polygon backtester or the Yahoo backtester instead, they are much easier to use and get started with. The Pandas backtester is intended for advanced users who have their own data and want to use it with Lumibot.
Pandas backtester is named after the python dataframe library because the user must provide a strictly formatted dataframe. You can use any csv, parquet, database data, etc that you wish, but Lumibot will only accept one format of dataframe.
Pandas backtester allows for intra-day and inter-day backtesting. Time frames for raw data are 1 minute and 1 day.
Additionally, with Pandas backtester, it is possible to backtest stocks, stock-like securities, futures contracts, crypto and FOREX.
Pandas backtester is the most flexible backtester in Lumibot, but it is also the most difficult to use. It is intended for advanced users who have their own data and want to use it with Lumibot.
Start by importing the Pandas backtester as follows:
from lumibot.backtesting import PandasDataBacktesting, BacktestingBroker
Next, create your Strategy class as you normally would. You can use any of the built-in indicators or create your own. You can also use any of the built-in order types or create your own.
from lumibot.strategies import Strategy
class MyStrategy(Strategy):
def on_trading_iteration(self):
# Do something here
Lumibot will start trading at 0000 hrs for the first date and up to 2359 hrs for the last. This is considered to be in the default time zone of Lumibot unless changed. This is America/New York (aka: EST)
Pandas backtester will receive a dataframe in the following format:
Index:
name: datetime
type: datetime64
Columns:
names: ['open', 'high', 'low', 'close', 'volume']
types: float
Your dataframe should look like this:
datetime |
open |
high |
low |
close |
volume |
---|---|---|---|---|---|
2020-01-02 09:31:00 |
3237.00 |
3234.75 |
3235.25 |
3237.00 |
16808 |
2020-01-02 09:32:00 |
3237.00 |
3234.00 |
3237.00 |
3234.75 |
10439 |
2020-01-02 09:33:00 |
3235.50 |
3233.75 |
3234.50 |
3234.75 |
8203 |
… |
… |
… |
… |
… |
… |
2020-04-22 15:56:00 |
2800.75 |
2796.25 |
2800.75 |
2796.25 |
8272 |
2020-04-22 15:57:00 |
2796.50 |
2794.00 |
2796.25 |
2794.00 |
7440 |
2020-04-22 15:58:00 |
2794.75 |
2793.00 |
2794.25 |
2793.25 |
7569 |
Other formats for dataframes will not work.
You can download an example CSV using the yfinance library as follows:
import yfinance as yf
# Download minute data for the last 5 days for AAPL
data = yf.download("AAPL", period="5d", interval="1m")
# Save the data to a CSV file
data.to_csv("AAPL.csv")
The data objects will be collected in a dictionary called pandas_data
using the asset as key and the data object as value. Subsequent assets + data can be added and then the dictionary can be passed into Lumibot for backtesting.
One of the important differences when using Pandas backtester is that you must use an Asset
object for each data csv file loaded. You may not use a symbol
as you might in Yahoo backtester.
For example, if you have a CSV file for AAPL, you must create an Asset
object for AAPL and then pass that into the Data
object.
from lumibot.entities import Asset
asset = Asset(
symbol="AAPL",
asset_type=Asset.AssetType.STOCK,
)
Next step will be to load the dataframe from csv.
import pandas as pd
# The names of the columns are important. Also important that all dates in the
# dataframe are time aware before going into lumibot.
df = pd.read_csv("AAPL.csv")
Third we make a data object for the asset. The data object must have at least the asset object, the dataframe, and the timestep. The timestep can be either minute
or day
. If you are using minute data, you must have a minute
timestep. If you are using daily data, you must have a day
timestep.
from lumibot.entities import Data
data = Data(
asset,
df,
timestep="minute",
)
Next, we create or add to the dictionary that will be passed into Lumibot.
pandas_data = {
asset: data
}
Finally, we can pass the pandas_data
dictionary into Lumibot and run the backtest.
# Run the backtesting
trader = Trader(backtest=True)
data_source = PandasDataBacktesting(
pandas_data=pandas_data,
datetime_start=backtesting_start,
datetime_end=backtesting_end,
)
broker = BacktestingBroker(data_source)
strat = MyStrategy(
broker=broker,
budget=100000,
)
trader.add_strategy(strat)
trader.run_all()
In Summary#
Putting all of this together, and adding in budget and strategy information, the code would look like the following:
Getting the data would look something like this (this is using yfinance to download the data, but you can use any data source you wish):
import yfinance as yf
# Download minute data for the last 5 days for AAPL
data = yf.download("AAPL", period="5d", interval="1m")
# Save the data to a CSV file
data.to_csv("AAPL.csv")
Then, the startegy and backtesting code would look something like this:
import pandas as pd
from lumibot.backtesting import BacktestingBroker, PandasDataBacktesting
from lumibot.entities import Asset, Data
from lumibot.strategies import Strategy
# A simple strategy that buys SPY on the first day
class MyStrategy(Strategy):
def on_trading_iteration(self):
if self.first_iteration:
order = self.create_order("AAPL", 100, "buy")
self.submit_order(order)
# Read the data from the CSV file (in this example you must have a file named "AAPL.csv"
# in a folder named "data" in the same directory as this script)
df = pd.read_csv("AAPL.csv")
asset = Asset(
symbol="AAPL",
asset_type=Asset.AssetType.STOCK,
)
pandas_data = {}
pandas_data[asset] = Data(
asset,
df,
timestep="minute",
)
# Pick the date range you want to backtest
backtesting_start = pandas_data[asset].datetime_start
backtesting_end = pandas_data[asset].datetime_end
# Run the backtest
result = MyStrategy.run_backtest(
PandasDataBacktesting,
backtesting_start,
backtesting_end,
pandas_data=pandas_data,
)