Resample non-continuous on daily boundaries
I have minute data that I am resampling with compression 240 to get 4H candles but when I run the strategy the 4H resampled data is skipping a period at the daily boundary
date close 7194 2018-07-05 23:54:00 6532.96 7195 2018-07-05 23:55:00 6532.96 7196 2018-07-05 23:56:00 6532.95 7197 2018-07-05 23:57:00 6532.96 7198 2018-07-05 23:58:00 6532.96 7199 2018-07-05 23:59:00 6532.95 7200 2018-07-06 00:00:00 6532.95 7201 2018-07-06 00:01:00 6532.96 7202 2018-07-06 00:02:00 6532.96 7203 2018-07-06 00:03:00 6530.62 7204 2018-07-06 00:04:00 6528.01 7205 2018-07-06 00:05:00 6532.96 7206 2018-07-06 00:06:00 6535.80 7207 2018-07-06 00:07:00 6540.99
Below you can see a print statement from the strategy next().
txt = ','.join( ['%04d' % len(self), '%04d' % len(self.data0), '%04d' % len(self.data1), self.data.datetime.date(0).isoformat(), self.data.datetime.time(0).isoformat(), '%.2f' % self.data0.close, '%.2f' % self.rsi.rsi, '%.2f' % self.stochRsiD, '%.2f' % self.stochRsiK]) print(txt)
And here is the output of that print statement
7196,7196,0034,2018-07-05,23:55:00,6532.96,46.82,30.03,39.43 7197,7197,0034,2018-07-05,23:56:00,6532.95,46.82,30.03,39.43 7198,7198,0034,2018-07-05,23:57:00,6532.96,46.82,30.03,39.43 7199,7199,0034,2018-07-05,23:58:00,6532.96,46.82,30.03,39.43 7200,7200,0034,2018-07-05,23:59:00,6532.95,46.82,30.03,39.43 7201,7201,0035,2018-07-06,00:00:00,6532.95,49.55,18.89,31.28 7202,7202,0036,2018-07-06,00:01:00,6532.96,49.55,8.97,19.30 7203,7203,0036,2018-07-06,00:02:00,6532.96,49.55,8.97,19.30 7204,7204,0036,2018-07-06,00:03:00,6530.62,49.55,8.97,19.30 7205,7205,0036,2018-07-06,00:04:00,6528.01,49.55,8.97,19.30
The third column gives the count of 4H periods. You can see that 34 and 36 are getting called every minute for 240 consecutive times before the 4H count increments, however 35 only gets called once.
This also creates discontinuities in the indicators.
All of the minute data is present and is continuous with no gaps so why does resample cause this discontinuity on the daily boundary?
I have added the data feed import code as requested below. I'm fairly certain this is correct as I've compared it directly to candle charts online and the values and timing match exactly.
sales_raw = pd.read_csv(fname); sales = sales_raw.copy() sales.columns = ['timestamp', 'price', 'amount'] sales['date'] = pd.to_datetime(sales['timestamp'], unit='s') sales = sales[['date', 'price', 'amount']].set_index('date') def create_candles(data, period, label='left'): candle = pd.DataFrame() candle['open'] = data.resample(period, label=label).first().ffill()['price'] candle['high'] = data.resample(period, label=label).max().ffill()['price'] candle['low'] = data.resample(period, label=label).min().ffill()['price'] candle['close'] = data.resample(period, label=label).last().ffill()['price'] candle['volume'] = data.resample(period, label=label).sum().ffill()['amount'] return candle candles = create_candles(sales, 'T') data = bt.feeds.PandasData(dataname=candles, openinterest=None)
Below is where I add the data and resample to 4H. Since the original post I found out that adding boundoff=1 will solve the problem of the discontinuity, however the 4H periods are now calculated at the start of the 4H period instead of at the end, meaning that the close value is looking into the future. I need to now shift the 4H period forward by one 4 hour period
cerebro = bt.Cerebro() cerebro.addstrategy(St, multi=True) cerebro.adddata(data) cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=240, boundoff=1)
Can you add data feed import code snippet?
@ab_trader I have updated my post with the code to import the data feed and add it to cerebro
@sfkiwi maybe sessionstart and sessionend parameters need to be added to original data.
@ab_trader I think I sorted it out. Using boundoff=1 does the trick but its a little misleading at first as the 4H periods will start at min :59. However this is actually correct. The 00:01:00 (1 min candle) is actually the data from 00:01:00 to 00:01:59, however when next() is called you get O,H,L,C meaning that you are receiving the candle at time 00:02:00 when the candle has closed. You see the 00:01:00 candle at time = 00:02:00. Therefore the 4H candle can only be seen at 03:59:00 because this is the same close value as the 03:59:00 1 min candle. So you are seeing the previous 4H candle at 04:00:00 which is when you also see the 03:59:00 1min candle. This way you are only looking at past values and not looking into the future.
What I would have expected was for the 4 hour candle to be 00:00:00 but shown at 00:04:00 as this would be consistent with the way the 1min candles work.
The convention in backtrader is to give you the time at which the candle ends. The reason is that this makes sense when looking at how the candle is constructed in real-time or using
replaydata. The last incoming data point is the last one adding info to the candle, not only in terms of price action and volume, but also at which time it has happened.
It would be weird, imho, if a data point which happened at
00:03:59:59.9999were to be shown in a candle with a timestamp of