Data

class entities.data.Data(asset, df, date_start=None, date_end=None, trading_hours_start=datetime.time(0, 0), trading_hours_end=datetime.time(23, 59), timestep='minute', quote=None, timezone=None)

Bases: object

Input and manage Pandas dataframes for backtesting.

Parameters:
  • asset (Asset Object) – Asset to which this data is attached.

  • df (dataframe) – Pandas dataframe containing OHLCV etc. trade data. Loaded by user from csv. Index is date and must be pandas datetime64. Columns are strictly [“open”, “high”, “low”, “close”, “volume”]

  • quote (Asset Object) – The quote asset for this data. If not provided, then the quote asset will default to USD.

  • date_start (Datetime or None) – Starting date for this data, if not provided then first date in the dataframe.

  • date_end (Datetime or None) – Ending date for this data, if not provided then last date in the dataframe.

  • trading_hours_start (datetime.time or None) – If not supplied, then default is 0001 hrs.

  • trading_hours_end (datetime.time or None) – If not supplied, then default is 2359 hrs.

  • timestep (str) – Either “minute” (default) or “day”

  • localize_timezone (str or None) – If not None, then localize the timezone of the dataframe to the given timezone as a string. The values can be any supported by tz_localize, e.g. “US/Eastern”, “UTC”, etc.

asset

Asset object to which this data is attached.

Type:

Asset Object

sybmol

The underlying or stock symbol as a string.

Type:

str

df

Pandas dataframe containing OHLCV etc trade data. Loaded by user from csv. Index is date and must be pandas datetime64. Columns are strictly [“open”, “high”, “low”, “close”, “volume”]

Type:

dataframe

date_start

Starting date for this data, if not provided then first date in the dataframe.

Type:

Datetime or None

date_end

Ending date for this data, if not provided then last date in the dataframe.

Type:

Datetime or None

trading_hours_start

If not supplied, then default is 0001 hrs.

Type:

datetime.time or None

trading_hours_end

If not supplied, then default is 2359 hrs.

Type:

datetime.time or None

timestep

Either “minute” (default) or “day”

Type:

str

datalines

Keys are column names like datetime or close, values are numpy arrays.

Type:

dict

iter_index

Datetime in the index, range count in values. Used to retrieve the current df iteration for this data and datetime.

Type:

Pandas Series

set_times()

Sets the start and end time for the data.

repair_times_and_fill()

After all time series merged, adjust the local dataframe to reindex and fill nan’s.

columns()

Adjust date and column names to lower case.

set_date_format()

Ensure datetime in local datetime64 format.

set_dates()

Set start and end dates.

trim_data()

Trim the dataframe to match the desired backtesting dates.

to_datalines()

Create numpy datalines from existing date index and columns.

get_iter_count()

Returns the current index number (len) given a date.

check_data(wrapper)

Validates if the provided date, length, timeshift, and timestep will return data. Runs function if data, returns None if no data.

get_last_price()

Gets the last price from the current date.

_get_bars_dict()

Returns bars in the form of a dict.

get_bars()

Returns bars in the form of a dataframe.

MIN_TIMESTEP = 'minute'
TIMESTEP_MAPPING = [{'representations': ['1D', 'day'], 'timestep': 'day'}, {'representations': ['1M', 'minute'], 'timestep': 'minute'}]
check_data()
columns(df)
get_bars(dt, length=1, timestep='minute', timeshift=0)

Returns a dataframe of the data.

Parameters:
  • dt (datetime.datetime) – The datetime to get the data.

  • length (int) – The number of periods to get the data.

  • timestep (str) – The frequency of the data to get the data. Only minute and day are supported.

  • timeshift (int) – The number of periods to shift the data.

Return type:

pandas.DataFrame

get_bars_between_dates(timestep='minute', exchange=None, start_date=None, end_date=None)

Returns a dataframe of all the data available between the start and end dates.

Parameters:
  • timestep (str) – The frequency of the data to get the data. Only minute and day are supported.

  • exchange (str) – The exchange to get the data for.

  • start_date (datetime.datetime) – The start date to get the data for.

  • end_date (datetime.datetime) – The end date to get the data for.

Return type:

pandas.DataFrame

get_iter_count(dt)
get_last_price(*args, **kwargs)
get_quote(*args, **kwargs)
repair_times_and_fill(idx)
set_date_format(df)
set_dates(date_start, date_end)
set_times(trading_hours_start, trading_hours_end)

Set the start and end times for the data. The default is 0001 hrs to 2359 hrs.

Parameters:
  • trading_hours_start (datetime.time) – The start time of the trading hours.

  • trading_hours_end (datetime.time) – The end time of the trading hours.

Returns:

  • trading_hours_start (datetime.time) – The start time of the trading hours.

  • trading_hours_end (datetime.time) – The end time of the trading hours.

to_datalines()
trim_data(df, date_start, date_end, trading_hours_start, trading_hours_end)