Backtesting multiple symbols with different date ranges - backtest won't start until all data feeds have data
-
Hi i have been researching backtesting systems and i read that Backtrader won't start a backtest until all data feeds have data.
For example if if i have AAPL from 2000-2017 and GOOG from 2012-2017, the backtest won't start until 2012 (first day of GOOG data)
Is this true is there anyway to fix?ref: https://ntguardian.wordpress.com/2017/06/12/getting-started-with-backtrader/
-
Just for some background info: i have a database of thousands of stocks (survivorship bias free) so some of the stocks have varied date ranges and no longer trade, i want to backtest over the whole universe of stocks.
-
I believe it is possible with certain basic code improvements. I think this was also discussed on the forum, but I was not able to find the topic.
PS You will need to do operations in both
prenext()
andnext()
functions as well as need to check what data is already delivered and what doesn't. -
Thanks ab_trader, i'm surprised as i thought there would be many people wanting to backtest multiple symbols at once, and naturally some symbols will have a longer history than others.
I'll look into prenext to see if there is anything that can solve this problem. -
looks like prenext might be a good solution, run the strategy in prenext and deal with min bars for my indicators in my own code.
I do my heavy analysis and backtesting in quantstrat (R framework) which handles this issue i'm talking about. But "backtester" looks promising so being able to backtest and go live in the same framework would be cool. I can't see myself fully doing a way with R though. -
def prenext(self): self.next() def next(self): feeds_with_data = [(i, d) for i, d in enumerate(self.datas) if len(d)] ...
That should give you the data feeds which actually have data. The 2nd feat is to keep the indicators associated to the data. That's what
enumerate
is doing by giving you the index, which you can use into arrays which keep the indicators for the data.You should bear in mind that even if the feed has already data, that doesn't mean the indicators are ready yet. A similar scheme (
len
check) for each indicator is therefore needed. -
Note that data0 is the master data and sets the time when trades will start (using Paska Houso's trick above). So make sure data0 is the data feed with the oldest start date.
-
@benoît-zuber said in Backtesting multiple symbols with different date ranges - backtest won't start until all data feeds have data:
Note that data0 is the master data and sets the time when trades will start
I thought that restriction was removed some time ago in
backtrader
. Are you sure? -
@paska-houso Yes I have just tried again with the latest version and I am still getting the same result.
-
@kaya I realize I'm late to the party. Just wondering how you made out. Did you create several thousand datafeeds? I'm considering the same issue. Please let me know if you'd like to collaborate.
-
@scottz1 can you solve this question?
-
@Paska-Houso this way maybe have somthing error,for example,the shorter data can get before it happened.
-
@Paska-Houso I test this way,but it generate a terrible situation.for example:
2005-01-18, today Portfolio Value: 500000.00 2005-01-18, count:11,日期:2005-01-18,数据名称:meta,收盘价:1.0,成交量:1.0 2005-01-18, count:11,日期:2009-09-15,数据名称:RB0909.XSGE,收盘价:3600.0,成交量:720.0 2005-01-18, count:11,日期:2009-10-15,数据名称:RB0910.XSGE,收盘价:3432.0,成交量:720.0 2005-01-18, count:11,日期:2009-11-16,数据名称:RB0911.XSGE,收盘价:3630.0,成交量:240.0 2005-01-18, count:11,日期:2009-12-15,数据名称:RB0912.XSGE,收盘价:3703.0,成交量:1260.0 2005-01-18, count:11,日期:2010-01-15,数据名称:RB1001.XSGE,收盘价:3975.0,成交量:1140.0 2005-01-18, count:11,日期:2010-02-09,数据名称:RB1002.XSGE,收盘价:3820.0,成交量:660.0 2005-01-18, count:11,日期:2010-03-15,数据名称:RB1003.XSGE,收盘价:4010.0,成交量:2100.0 2005-01-18, count:11,日期:2010-04-15,数据名称:RB1004.XSGE,收盘价:4415.0,成交量:1440.0 2005-01-18, count:11,日期:2010-05-17,数据名称:RB1005.XSGE,收盘价:3915.0,成交量:1620.0 2005-01-18, count:11,日期:2010-06-17,数据名称:RB1006.XSGE,收盘价:3978.0,成交量:360.0 2005-01-18, count:11,日期:2010-07-15,数据名称:RB1007.XSGE,收盘价:3787.0,成交量:360.0 2005-01-18, count:11,日期:2010-08-16,数据名称:RB1008.XSGE,收盘价:4094.0,成交量:240.0 2005-01-18, count:11,日期:2010-09-15,数据名称:RB1009.XSGE,收盘价:4440.0,成交量:300.0
-
May I suggest as an additional option just zero-padding, symmetrization, or smooth padding out the short data feed to the length of the longest line? Then all the data will start at the same bar. At the end you can simply trim the output to match the actual individual line length if you wish.
-
I do this as follows:
class MyStrategy(bt.Strategy):
params = (('short', 5), ('long', 10))
def init(self):
self.inds = dict()
for i, d in enumerate(self.datas):
self.inds[d] = dict()
self.inds[d]['sma_s'] = bt.indicators.MovingAverageSimple(d.close, period=self.params.short)
self.inds[d]['sma_l'] = bt.indicators.MovingAverageSimple(d.close, period=self.params.long)
self.inds[d]['start_time'] = d.datetime.date(self.params.long)
self.inds[d]['start_flag'] = 0def prenext(self):
for i in range(3):
if self.datas[i].datetime.date() == self.inds[self.datas[i]]['start_time']:
self.inds[self.datas[i]]['start_flag'] = 1
log_str = log_str + 'i={}, date={}, sma_s={}, sma_l={}\n'.format(i, self.datas[i].datetime.date(),
self.inds[self.datas[i]]['sma_s'][0],
self.inds[self.datas[i]]['sma_l'][0])
elif self.inds[self.datas[i]]['start_flag'] == 1:
log_str = log_str + 'i={}, date={}, sma_s={}, sma_l={}\n'.format(i, self.datas[i].datetime.date(),
self.inds[self.datas[i]]['sma_s'][0],
self.inds[self.datas[i]]['sma_l'][0])prenext() process data when time is not the same. If the time is the same, then go to next().