Strange behavior around holidays
-
Hi there,
I am in the process of switching from Zipline to Backtrader, and analyze the difference between the two backtrading engines for some algos.
While the holdings and returns are very close for some periods, I have big variations at some points in time. One source of difference I have identified occurs around holidays where the algo seems to be looping twice on the day preceding the holiday, and behave in a fashion I cannot explain shortly after.
I have for instance identified this behavior the around Good Friday 2016 (24-03-2016), where the behavior is the following:
- 24-03-2016 --> algo loops twice through next
- 25-03-2016 --> Good Friday (market closed)
- 26-03-2016 & 27-03-2016 --> weekend (market closed)
- 28-03-2016 --> nothing happens
- 29-03-2016 --> orders are submitted, accepted and executed, but no position is reported for that day in my log
- 30-03-2016 --> back to normal
As you'll see in my code, I tried two different options to add a calendar (one commented) as it seemed to the issue here, but it did not help.
Below is my code which is a slightly modified version of the the following script.
https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/master/08_ml4t_workflow/03_backtesting_with_backtrader.ipynbRemarks
- My data feed does not contain any values for the holidays, but does for every other days when the market is open (28-03-2016 for instance)
- My signal (predicted column of self.datas) is the output of a simple regression.
My knowledge of backtrader is quite limited yet, and this issue has gotten me stuck for a while now. Any help would be highly appreciated. Please let me know if the log for the days mentionned above would help.
from pathlib import Path import csv from time import time import datetime import numpy as np import pandas as pd import pandas_datareader.data as web import matplotlib.pyplot as plt import seaborn as sns import pandas_market_calendars as mcal import backtrader as bt from backtrader.feeds import PandasData import quantstats as qs #nyse = mcal.get_calendar('NYSE') class NYSE_2016(bt.TradingCalendar): params = dict( holidays=[ datetime.date(2016, 1, 1), datetime.date(2016, 1, 18), datetime.date(2016, 2, 15), datetime.date(2016, 3, 25), datetime.date(2016, 5, 30), datetime.date(2016, 7, 4), datetime.date(2016, 9, 5), datetime.date(2016, 11, 24), datetime.date(2016, 12, 26), ] ) pd.set_option('display.expand_frame_repr', False) np.random.seed(42) sns.set_style('darkgrid') def format_time(t): m_, s = divmod(t, 60) h, m = divmod(m_, 60) return f'{h:>02.0f}:{m:>02.0f}:{s:>02.0f}' #----------------------------- BACKTRADER SETUP ------------------------------- class FixedCommisionScheme(bt.CommInfoBase): """ Simple fixed commission scheme for demo """ params = ( ('commission', .02), ('stocklike', True), ('commtype', bt.CommInfoBase.COMM_FIXED), ) def _getcommission(self, size, price, pseudoexec): return abs(size) * self.p.commission #----------------- DATAFRAME LOADER ----------------- OHLCV = ['open', 'high', 'low', 'close', 'volume'] class SignalData(PandasData): """ Define pandas DataFrame structure """ cols = OHLCV + ['predicted'] # create lines lines = tuple(cols) # define parameters params = {c: -1 for c in cols} params.update({'datetime': None}) params = tuple(params.items()) #----------------- STRATEGY ----------------- class MLStrategy(bt.Strategy): params = (('n_positions', 20), ('min_positions', 10), ('verbose', False), ('log_file', 'backtest.csv')) def log(self, txt, dt=None): """ Logger for the strategy""" dt = dt or self.datas[0].datetime.datetime(0) with Path(self.p.log_file).open('a') as f: log_writer = csv.writer(f) log_writer.writerow([dt.isoformat()] + txt.split(',')) def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: if order.status in [order.Submitted]: self.log(f'{order.data._name},SUBMITTED') if order.status in [order.Accepted]: self.log(f'{order.data._name},ACCEPTED') return if self.p.verbose: if order.status in [order.Completed]: p = order.executed.price if order.isbuy(): self.log(f'{order.data._name},BUY executed,{p:.2f}') elif order.issell(): self.log(f'{order.data._name},SELL executed,{p:.2f}') elif order.status in [order.Canceled]: self.log(f'{order.data._name},Order Canceled') elif order.status in [order.Margin]: self.log(f'{order.data._name},Order Margin') elif order.status in [order.Rejected]: self.log(f'{order.data._name},Order Rejected') def prenext(self): self.next() def next(self): self.log('next') today = self.datas[0].datetime.date() positions = [d._name for d, pos in self.getpositions().items() if pos] posdata = [d for d, pos in self.getpositions().items() if pos] up, down = {}, {} missing = not_missing = 0 for data in self.datas: if data.datetime.date() == today: if data.predicted[0] > 0: up[data._name] = data.predicted[0] elif data.predicted[0] < 0: down[data._name] = data.predicted[0] for ticker in posdata: self.log(f'{ticker._name,self.getposition(data=ticker).size},POSITION') shorts = sorted(down, key=down.get)[:self.p.n_positions] longs = sorted(up, key=up.get, reverse=True)[:self.p.n_positions] n_shorts, n_longs = len(shorts), len(longs) if n_shorts < self.p.min_positions or n_longs < self.p.min_positions: longs, shorts = [], [] else: short_target = -1 / n_shorts long_target = 1 / n_longs for ticker in positions: if ticker not in longs + shorts: self.order_target_percent(data=ticker, target=0) self.log(f'{ticker},CLOSING ORDER CREATED') for ticker in shorts: self.order_target_percent(data=ticker, target=short_target) self.log(f'{ticker},SHORT ORDER CREATED') for ticker in longs: self.order_target_percent(data=ticker, target=long_target) self.log(f'{ticker},LONG ORDER CREATED') #CREATE AND CONFIGURE CEREBRO INSTANCE cerebro = bt.Cerebro() cash = 1000000 cerebro.broker.setcash(cash) #------------------------------ ADD INPUT DATA -------------------------------- idx = pd.IndexSlice data = pd.read_hdf('data.h5', 'backtest_data').sort_index() tickers = data.index.get_level_values(0).unique() for ticker in tickers: df = data.loc[idx[ticker, :], :].droplevel('ticker', axis=0) df.index.name = 'datetime' bt_data = SignalData(dataname=df) cerebro.adddata(bt_data, name=ticker) #---------------------------- RUN STRATEGY BACKTEST --------------------------- #cerebro.addcalendar(nyse) cerebro.addcalendar(NYSE_2016) cerebro.addanalyzer(bt.analyzers.PyFolio, _name='pyfolio') cerebro.addstrategy(MLStrategy, n_positions=20, min_positions=10, verbose=True, log_file='backtesting_backtrader_log.csv') start = time() results = cerebro.run() ending_value = cerebro.broker.getvalue() duration = time() - start print(f'Final Portfolio Value: {ending_value:,.2f}') print(f'Duration: {format_time(duration)}') #GET PYFOLIO INPUTS pyfolio_analyzer = results[0].analyzers.getbyname('pyfolio') returns, positions, transactions, gross_lev = pyfolio_analyzer.get_pf_items() returns.rename_axis(index={'index':'date'}) gross_lev.rename_axis(index={'index':'date'}) positions.rename_axis(index={'Datetime':'date'}) returns.to_hdf('backtest.h5', 'backtrader/returns') positions.to_hdf('backtest.h5', 'backtrader/positions') transactions.to_hdf('backtest.h5', 'backtrader/transactions') gross_lev.to_hdf('backtest.h5', 'backtrader/gross_lev') #------------------------------- RUN PYFOLIO ---------------------------------- returns = pd.read_hdf('backtest.h5', 'backtrader/returns') positions = pd.read_hdf('backtest.h5', 'backtrader/positions') transactions = pd.read_hdf('backtest.h5', 'backtrader/transactions') gross_lev = pd.read_hdf('backtest.h5', 'backtrader/gross_lev') benchmark = web.DataReader('SP500', 'fred', '2014', '2018').squeeze() benchmark = benchmark.pct_change().tz_localize('UTC') daily_tx = transactions.groupby(level=0) longs = daily_tx.value.apply(lambda x: x.where(x>0).sum()) shorts = daily_tx.value.apply(lambda x: x.where(x<0).sum()) fig, axes = plt.subplots(ncols=2, figsize=(15, 5)) df = returns.to_frame('Strategy').join(benchmark.to_frame('Benchmark (S&P 500)')) df.add(1).cumprod().sub(1).plot(ax=axes[0], title='Cumulative Return') longs.plot(label='Long',ax=axes[1], title='Positions') shorts.plot(ax=axes[1], label='Short') positions.cash.plot(ax=axes[1], label='PF Value') axes[1].legend() sns.despine() fig.tight_layout() plt.show() plt.close()
-
I don't think you need a trading calendar to be added in case if you don't use resampling and you have no data on the holidays.
Also there is a lot of things going on in your script (necessary or not), so I would split it on a simpler pieces to debug.
And it us useful to have output as well.
-
@ab_trader Thanks for your feedback.
I removed from my code whatever was not necessary to reproduce the issue (in my environment at least):
from pathlib import Path import csv from time import time import numpy as np import pandas as pd import seaborn as sns import backtrader as bt from backtrader.feeds import PandasData pd.set_option('display.expand_frame_repr', False) np.random.seed(42) sns.set_style('darkgrid') def format_time(t): m_, s = divmod(t, 60) h, m = divmod(m_, 60) return f'{h:>02.0f}:{m:>02.0f}:{s:>02.0f}' #----------------- DATAFRAME LOADER ----------------- OHLCV = ['open', 'high', 'low', 'close', 'volume'] class SignalData(PandasData): cols = OHLCV + ['predicted'] lines = tuple(cols) params = {c: -1 for c in cols} params.update({'datetime': None}) params = tuple(params.items()) #----------------- STRATEGY ----------------- class MLStrategy(bt.Strategy): params = (('n_positions', 20), ('min_positions', 10), ('verbose', False), ('log_file', 'backtest.csv')) def log(self, txt, dt=None): """ Logger for the strategy""" dt = dt or self.datas[0].datetime.datetime(0) with Path(self.p.log_file).open('a') as f: log_writer = csv.writer(f) log_writer.writerow([dt.isoformat()] + txt.split(',')) def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: if order.status in [order.Submitted]: self.log(f'{order.data._name},SUBMITTED') if order.status in [order.Accepted]: self.log(f'{order.data._name},ACCEPTED') return if self.p.verbose: if order.status in [order.Completed]: p = order.executed.price if order.isbuy(): self.log(f'{order.data._name},BUY executed,{p:.2f}') elif order.issell(): self.log(f'{order.data._name},SELL executed,{p:.2f}') elif order.status in [order.Canceled]: self.log(f'{order.data._name},Order Canceled') elif order.status in [order.Margin]: self.log(f'{order.data._name},Order Margin') elif order.status in [order.Rejected]: self.log(f'{order.data._name},Order Rejected') def prenext(self): self.next() def next(self): self.log('next') today = self.datas[0].datetime.date() positions = [d._name for d, pos in self.getpositions().items() if pos] posdata = [d for d, pos in self.getpositions().items() if pos] up, down = {}, {} for data in self.datas: if data.datetime.date() == today: if data.predicted[0] > 0: up[data._name] = data.predicted[0] elif data.predicted[0] < 0: down[data._name] = data.predicted[0] for ticker in posdata: self.log(f'{ticker._name,self.getposition(data=ticker).size},POSITION') shorts = sorted(down, key=down.get)[:self.p.n_positions] longs = sorted(up, key=up.get, reverse=True)[:self.p.n_positions] n_shorts, n_longs = len(shorts), len(longs) if n_shorts < self.p.min_positions or n_longs < self.p.min_positions: longs, shorts = [], [] else: short_target = -1 / n_shorts long_target = 1 / n_longs for ticker in positions: if ticker not in longs + shorts: self.order_target_percent(data=ticker, target=0) self.log(f'{ticker},CLOSING ORDER CREATED') for ticker in shorts: self.order_target_percent(data=ticker, target=short_target) self.log(f'{ticker},SHORT ORDER CREATED') for ticker in longs: self.order_target_percent(data=ticker, target=long_target) self.log(f'{ticker},LONG ORDER CREATED') cerebro = bt.Cerebro() cash = 1000000 cerebro.broker.setcash(cash) #------------------------------ ADD INPUT DATA -------------------------------- idx = pd.IndexSlice data = pd.read_hdf('data.h5', 'backtest_data').sort_index() tickers = data.index.get_level_values(0).unique() for ticker in tickers: df = data.loc[idx[ticker, :], :].droplevel('ticker', axis=0) df.index.name = 'datetime' bt_data = SignalData(dataname=df) cerebro.adddata(bt_data, name=ticker) #---------------------------- RUN STRATEGY BACKTEST --------------------------- #cerebro.addcalendar(nyse) cerebro.addanalyzer(bt.analyzers.PyFolio, _name='pyfolio') cerebro.addstrategy(MLStrategy, n_positions=20, min_positions=10, verbose=True, log_file='backtesting_backtrader_log.csv') start = time() results = cerebro.run() ending_value = cerebro.broker.getvalue() duration = time() - start
Unfortunately the log generated from 24-03-2016 to 30-03-2016 is too long to be posted here in its entirety. I just left the first 3 lines of each part of the log (3 first tickers) and replaced the rest with 3 dots. I hope it's still helpful enough.
2016-03-24T00:00:00,CNX,SUBMITTED 2016-03-24T00:00:00,DDD,SUBMITTED 2016-03-24T00:00:00,HOV,SUBMITTED ... 2016-03-24T00:00:00,CNX,ACCEPTED 2016-03-24T00:00:00,DDD,ACCEPTED 2016-03-24T00:00:00,HOV,ACCEPTED ... 2016-03-24T00:00:00,CNX,BUY executed,10.27 2016-03-24T00:00:00,DDD,BUY executed,14.14 2016-03-24T00:00:00,HOV,SELL executed,38.00 ... 2016-03-24T00:00:00,next 2016-03-24T00:00:00,('AG', -6266),POSITION 2016-03-24T00:00:00,('AKS', -9768),POSITION ... 2016-03-24T00:00:00,AKS,CLOSING ORDER CREATED 2016-03-24T00:00:00,AU,CLOSING ORDER CREATED 2016-03-24T00:00:00,DNR,CLOSING ORDER CREATED ... 2016-03-24T00:00:00,HMY,SHORT ORDER CREATED 2016-03-24T00:00:00,TECK,SHORT ORDER CREATED 2016-03-24T00:00:00,VAL,SHORT ORDER CREATED ... 2016-03-24T00:00:00,BTU,LONG ORDER CREATED 2016-03-24T00:00:00,SALT,LONG ORDER CREATED 2016-03-24T00:00:00,BHC,LONG ORDER CREATED ... 2016-03-24T00:00:00,AKS,SUBMITTED 2016-03-24T00:00:00,AU,SUBMITTED 2016-03-24T00:00:00,DNR,SUBMITTED ... 2016-03-24T00:00:00,AKS,ACCEPTED 2016-03-24T00:00:00,AU,ACCEPTED 2016-03-24T00:00:00,DNR,ACCEPTED ... 2016-03-24T00:00:00,AKS,BUY executed,4.19 2016-03-24T00:00:00,AU,BUY executed,12.73 2016-03-24T00:00:00,DNR,SELL executed,2.25 ... 2016-03-24T00:00:00,next 2016-03-24T00:00:00,('AG', -5817),POSITION 2016-03-24T00:00:00,('AMRX', 1200),POSITION 2016-03-24T00:00:00,('AUY', -13857),POSITION ... 2016-03-24T00:00:00,AG,CLOSING ORDER CREATED 2016-03-24T00:00:00,AMRX,CLOSING ORDER CREATED 2016-03-24T00:00:00,AUY,CLOSING ORDER CREATED ... 2016-03-29T00:00:00,AG,SUBMITTED 2016-03-29T00:00:00,AMRX,SUBMITTED 2016-03-29T00:00:00,AUY,SUBMITTED ... 2016-03-29T00:00:00,AG,ACCEPTED 2016-03-29T00:00:00,AMRX,ACCEPTED 2016-03-29T00:00:00,AUY,ACCEPTED ... 2016-03-29T00:00:00,AG,BUY executed,6.42 2016-03-29T00:00:00,AMRX,SELL executed,31.22 2016-03-29T00:00:00,AUY,BUY executed,2.80 ... 2016-03-29T00:00:00,next 2016-03-29T00:00:00,VAL,SHORT ORDER CREATED 2016-03-29T00:00:00,CDE,SHORT ORDER CREATED 2016-03-29T00:00:00,AG,SHORT ORDER CREATED ... 2016-03-30T00:00:00,VAL,SUBMITTED 2016-03-30T00:00:00,CDE,SUBMITTED 2016-03-30T00:00:00,AG,SUBMITTED ... 2016-03-30T00:00:00,VAL,ACCEPTED 2016-03-30T00:00:00,CDE,ACCEPTED 2016-03-30T00:00:00,AG,ACCEPTED ... 2016-03-30T00:00:00,VAL,SELL executed,105.20 2016-03-30T00:00:00,CDE,SELL executed,5.64 2016-03-30T00:00:00,AG,SELL executed,6.84 ... 2016-03-30T00:00:00,next 2016-03-30T00:00:00,('AG', -5418),POSITION 2016-03-30T00:00:00,('AMRX', 1143),POSITION 2016-03-30T00:00:00,('AUY', -12576),POSITION ... 2016-03-30T00:00:00,AMRX,CLOSING ORDER CREATED 2016-03-30T00:00:00,AZO,CLOSING ORDER CREATED 2016-03-30T00:00:00,CYH,CLOSING ORDER CREATED ...
I hope it helps.
Best,
Rapha -
I can't use your script since it requires your data feeds, therefore I made some tests based on the basic
bt
scripts. I was able to have two calls of thenext
per one date only when the data feed has that date twice. So i would check the data feeds around that dates first. Also i would check if28-03-2021
date is present. -
@ab_trader
Thanks for the heads up.As you suggested, I have investigated the data and unfortunately, it does not look like in my case that it is the root cause of the problem.
On 2016-03-24, I only have unique references:
len(data.xs('2016-03-24',level = 1, drop_level = False).index) Out: 1033 len(data.xs('2016-03-24',level = 1, drop_level = False).index.unique()) Out: 1033
On 2016-03-28, there are data as well:
len(data.xs('2016-03-28',level = 1, drop_level = False).index) Out: 1030 len(data.xs('2016-03-28',level = 1, drop_level = False).index.unique()) Out: 1030
-
@rapha why the number of 24th is larger than the number of 28th? Should be the same, right? How many 23rds do you have.
-
@ab_trader good point. 23rd as 1033 tickers. I have a total of 1034 tickers in my data feed, which I have for some days, but apparently not all the time:
len(data.index.get_level_values(0).unique()) Out[48]: 1034 len(data.xs('2016-03-23',level = 1, drop_level = False).index.get_level_values(0).unique()) Out[49]: 1033 len(data.xs('2016-03-03',level = 1, drop_level = False).index.get_level_values(0).unique()) Out[50]: 1034
I will investigate the reason of these differences and post the explanation back here.
-
@ab_trader: I have looked into my data and while I am supposed to only have tickers traded on the NYSE in my feed, 5 of them were actually traded on European and Asian markets, following a slightly different calendar for holidays. Removing them solved the problem rergarding the two calls of next on 24-03-2016, and my holdings are now very closely aligned in Zipline and Backtrader every days! thanks for the help.
I still have some discrepancies in the returns, apparently caused by uneven weighting of my positions in Backtrader, which should follow a simple 1/N weighting scheme. I have not had time to look into this yet, and it is not even related to my orginal question. But if there is something odd you spot in my code, pls let me know.