For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Resample non-continuous on daily boundaries



  • I have minute data that I am resampling with compression 240 to get 4H candles but when I run the strategy the 4H resampled data is skipping a period at the daily boundary

    	date			close
    7194	2018-07-05 23:54:00	6532.96
    7195	2018-07-05 23:55:00	6532.96
    7196	2018-07-05 23:56:00	6532.95
    7197	2018-07-05 23:57:00	6532.96
    7198	2018-07-05 23:58:00	6532.96
    7199	2018-07-05 23:59:00	6532.95
    7200	2018-07-06 00:00:00	6532.95
    7201	2018-07-06 00:01:00	6532.96
    7202	2018-07-06 00:02:00	6532.96
    7203	2018-07-06 00:03:00	6530.62
    7204	2018-07-06 00:04:00	6528.01
    7205	2018-07-06 00:05:00	6532.96
    7206	2018-07-06 00:06:00	6535.80
    7207	2018-07-06 00:07:00	6540.99
    

    Below you can see a print statement from the strategy next().

            txt = ','.join(
                ['%04d' % len(self),
                 '%04d' % len(self.data0),
                 '%04d' % len(self.data1),
                 self.data.datetime.date(0).isoformat(),
                 self.data.datetime.time(0).isoformat(),
                 '%.2f' % self.data0.close[0],
                 '%.2f' % self.rsi.rsi[0],
                 '%.2f' % self.stochRsiD[0],
                 '%.2f' % self.stochRsiK[0]])
    
            print(txt)
    

    And here is the output of that print statement

    7196,7196,0034,2018-07-05,23:55:00,6532.96,46.82,30.03,39.43
    7197,7197,0034,2018-07-05,23:56:00,6532.95,46.82,30.03,39.43
    7198,7198,0034,2018-07-05,23:57:00,6532.96,46.82,30.03,39.43
    7199,7199,0034,2018-07-05,23:58:00,6532.96,46.82,30.03,39.43
    7200,7200,0034,2018-07-05,23:59:00,6532.95,46.82,30.03,39.43
    7201,7201,0035,2018-07-06,00:00:00,6532.95,49.55,18.89,31.28
    7202,7202,0036,2018-07-06,00:01:00,6532.96,49.55,8.97,19.30
    7203,7203,0036,2018-07-06,00:02:00,6532.96,49.55,8.97,19.30
    7204,7204,0036,2018-07-06,00:03:00,6530.62,49.55,8.97,19.30
    7205,7205,0036,2018-07-06,00:04:00,6528.01,49.55,8.97,19.30
    

    The third column gives the count of 4H periods. You can see that 34 and 36 are getting called every minute for 240 consecutive times before the 4H count increments, however 35 only gets called once.

    This also creates discontinuities in the indicators.

    All of the minute data is present and is continuous with no gaps so why does resample cause this discontinuity on the daily boundary?

    Update:
    I have added the data feed import code as requested below. I'm fairly certain this is correct as I've compared it directly to candle charts online and the values and timing match exactly.

    sales_raw = pd.read_csv(fname);
    sales = sales_raw.copy()
    sales.columns = ['timestamp', 'price', 'amount']
    sales['date'] = pd.to_datetime(sales['timestamp'], unit='s')
    sales = sales[['date', 'price', 'amount']].set_index('date')
    
    def create_candles(data, period, label='left'):
        candle = pd.DataFrame()
        candle['open'] = data.resample(period, label=label).first().ffill()['price']
        candle['high'] = data.resample(period, label=label).max().ffill()['price']
        candle['low'] = data.resample(period, label=label).min().ffill()['price']
        candle['close'] = data.resample(period, label=label).last().ffill()['price']
        candle['volume'] = data.resample(period, label=label).sum().ffill()['amount']
        return candle
    
    candles = create_candles(sales, 'T')
    
    data = bt.feeds.PandasData(dataname=candles, openinterest=None)
    

    Below is where I add the data and resample to 4H. Since the original post I found out that adding boundoff=1 will solve the problem of the discontinuity, however the 4H periods are now calculated at the start of the 4H period instead of at the end, meaning that the close value is looking into the future. I need to now shift the 4H period forward by one 4 hour period

    cerebro = bt.Cerebro()
    cerebro.addstrategy(St, multi=True)
    cerebro.adddata(data)
    cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=240, boundoff=1)


  • Can you add data feed import code snippet?



  • @ab_trader I have updated my post with the code to import the data feed and add it to cerebro



  • @sfkiwi maybe sessionstart and sessionend parameters need to be added to original data.



  • @ab_trader I think I sorted it out. Using boundoff=1 does the trick but its a little misleading at first as the 4H periods will start at min :59. However this is actually correct. The 00:01:00 (1 min candle) is actually the data from 00:01:00 to 00:01:59, however when next() is called you get O,H,L,C meaning that you are receiving the candle at time 00:02:00 when the candle has closed. You see the 00:01:00 candle at time = 00:02:00. Therefore the 4H candle can only be seen at 03:59:00 because this is the same close value as the 03:59:00 1 min candle. So you are seeing the previous 4H candle at 04:00:00 which is when you also see the 03:59:00 1min candle. This way you are only looking at past values and not looking into the future.

    What I would have expected was for the 4 hour candle to be 00:00:00 but shown at 00:04:00 as this would be consistent with the way the 1min candles work.


  • administrators

    The convention in backtrader is to give you the time at which the candle ends. The reason is that this makes sense when looking at how the candle is constructed in real-time or using replaydata. The last incoming data point is the last one adding info to the candle, not only in terms of price action and volume, but also at which time it has happened.

    It would be weird, imho, if a data point which happened at 00:03:59:59.9999 were to be shown in a candle with a timestamp of 00:00:00.0000