For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

cerebro.resample() introduces data in the future



  • Hi guys, I fould some weired behaviour when using 1min data and cerebro.resample() to create 1h data, in order to make decision from both timeframe.

    It seems like when I'm making decision by concentrating on 1h data, I've already recieved 1min data in the future. Just take a look at the output (time, open, high, low, close):
    m 2020-03-20 01:00:00 6219.15 6222.24 6212.39 6213.54 m
    h 2020-03-20 01:00:00 6200.76 6269.24 6174.0 6219.15 h
    ---------Do strategy logic here------------
    Note: 6219.15 is the close price of the 1h data at 2020-03-20 01:00:00, while 6219.15 is also the open price of the 1min data, which means when I'm accessing the 1h data during the "Do strategy logic here" session, self.datas[0].lines[0] will give me some information I shouldn't know right at that time(when 1h candle bar closed), in order words, introduce data in the future.

    I'm not sure this is intented or a potential bug.

    Here is the data and code I'm using to reproduce the output:

    BTC_USDT_1m

    import datetime
    
    import pandas as pd
    import backtrader as bt
    
    
    class Test(bt.Strategy):
    
        def log_bar(self, data, mark='*'):
            time = data.datetime.datetime(0)
            o = data.open[0]
            h = data.high[0]
            l = data.low[0]
            c = data.close[0]
            print(mark, time, o, h, l, c, mark)
    
        def log_logic(self):
            print("---------Do strategy logic here------------")
    
        def nextstart(self):
            self.lendata1 = 0
    
        def next(self):
            if len(self.data1) > self.lendata1:
                self.lendata1 = len(self.data1)
                self.log_bar(self.datas[0], 'm')
                self.log_bar(self.datas[1], 'h')
                self.log_logic()
                # do strategy logic here
    
    
    if __name__ == '__main__':
        cerebro = bt.Cerebro()
        cerebro.addstrategy(Test)
        datapath = 'datas/BTC_USDT_1m.csv'
        data = pd.read_csv(datapath, index_col='datetime', parse_dates=True)
        datafeed = bt.feeds.PandasData(dataname=data)
        cerebro.adddata(datafeed)
        cerebro.resampledata(
            datafeed, timeframe=bt.TimeFrame.Minutes, compression=60)
        cerebro.run()
    
    

  • administrators

    @xyshell said in cerebro.resample() introduces data in the future:

    Note: 6219.15 is the close price of the 1h data at 2020-03-20 01:00:00, while 6219.15 is also the open price of the 1min data

    No. There is no such thing as the closing price of the 1-hour data, because that data doesn't exist. Naming things properly does help.

    Additionally the information you provide is wrong, which is confirmed by looking at your data.

    3e5c5e53-01a2-434b-9b11-bae26573b7d9-image.png

    Note for the other readers: for whatever the reason, this data defies all established standards and has the following format: CHLOVTimestamp

    Your data indicates that

    • 6219.15 is the closing price of the 1-min bar at 00:59:00, hence the last bar to go into the resampling for 1-hour between 00:00:00 and 00:59:00 (60 bars -if all are present-) which is first delivered to you as a resampled bar at 01:00:00
    • 6219.15 is the opening price of the 1-min bar at 01:00:00

    At 01:00:00 you have two bits of information available:

    • The current 1-min bar for 01:00:00
    • The 1-hour resampled data for the period 00:00:00 to 00:59:00

    As expected.



  • @backtrader said in cerebro.resample() introduces data in the future:

    At 01:00:00 you have two bits of information available:

    • The current 1-min bar for 01:00:00
    • The 1-hour resampled data for the period 00:00:00 to 00:59:00

    As expected.

    Thanks for explanation, it's very clear. It's not introducing data in the future, but just delivering 1-hour resampled data 1min late, so that, any decision made during "# do strategy logic here" will be executed at 01:00:01 instead of 01:00:00.



  • @xyshell said in cerebro.resample() introduces data in the future:

    @backtrader said in cerebro.resample() introduces data in the future:

    At 01:00:00 you have two bits of information available:

    • The current 1-min bar for 01:00:00
    • The 1-hour resampled data for the period 00:00:00 to 00:59:00

    As expected.

    Thanks for explanation, it's very clear. It's not introducing data in the future, but just delivering 1-hour resampled data 1min late, so that, any decision made during "# do strategy logic here" will be executed at 01:00:01 instead of 01:00:00.

    I mean be executed at 01:01:00 instead of 01:00:00.


Log in to reply
 

});