Multiple dataseries with misaligned or missing dates?
Shaun last edited by
I'm trying to add different sources to the cerebro. So I wonder how backtrader treat time series with misaligned or missing date.
data1 is AAPL yahoo data which is the time series of price on trading days.
data2 is the highest temperature of the day which, by most of the format, is recorded by natural days. (continuous dates)
data3 is the top wind speed when the hurricane hit the coast. (scattered dates)
My idea is to perform backtest on AAPL(data1) with help of data2 and data3. So the question is can backtrader treat these data automatically? If so, when the next() fuction is called, how do I find the temperature of that trading day? and for the data3, if the date we called is empty(there is no hurricane), what is the value of self.datas."topwindspeed"?
run-out last edited by
@Shaun I would be inclined to use pandas to pre-manipulate the data and match indices, then load the pandas dataframe.
I think there are a couple of threads about this situation, as well as the author's blog. You can google it with keywords "backtrader multiple datafeeds synchronization".
My understanding is that the last available value of the datafeeds with scattered dates will be repeatedly delivered to cerebro as the clock progresses, assuming nothing special is being done. One solution is keep a counter of the length of each datafeed in your next() or prenext() function. When there is missing data, the length of the datafeed will not increase , even if the stale value is delivered again. Something like this:
def __init__(self): # datafeed pointer, keeping track if any datafeeds delivery new data or not self.df_pt = dict() for i, d in enumerate(self.datas): self.df_pt[d._name] = 0 def next(self): # check if new data has been delivered for current datafeed for i, d in enumerate(self.datas): dt, dn = self.datetime.date(), d._name if len(d) <= self.df_pt[dn]: pass # or any other actions you desire else: self.df_pt[dn] = len(d)