For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Information leakage ?? When I use multiple data feeds



  • I found strange behavior (information leakage in next()) when I used multiple feeds as input.
    Backtrader version : 1.9.74.123

    I have two feeds one is signal the other is market. ( signal is produced by other system )

    • market is observed every 1 second (Starting from 2017.05.27 21:00:00 ending at 2018.05.28 20:59:59)
      and it contains, datetime, up, neutral, down and signalID columns.
    • signal is observed every 10 seconds (Starting from 2017.05.27 21:00:00 ending at 2018.05.28 20:59:50)
      and it contains datetime, bid, ask and marketID columns.

    I printed out the datas[1] (i.e. signal) and it apparently refers to future data.
    Where did I get wrong ?

    class ThreeClassCSV(GenericCSVData):
        linesoverride = True  # discard usual OHLC structure
        # datetime must be present and last
        lines = ('up', 'neutral', 'down', 'signalID','datetime')
        # datetime (always 1st) and then the desired order for
        params = (
            ('datetime', 0), # inherited from parent class
            ('up', 1),  
            ('neutral', 2),
            ('down', 3),
            ('signalID', 4),
        )
    
        
    class MarketCSV(GenericCSVData):
        linesoverride = True
        lines = ('bid', 'ask', 'spread', 'marketID', 'datetime')
        
        params = (
            ('datetime', 0), # inherited from parent class
            ('bid', 1),  
            ('ask', 2),
            ('spread', 3),
            ('marketID', 4),
            
        )
        
    def parse_args():
        parser = argparse.ArgumentParser(
            description='Three class probability Line Hierarchy',
            formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        )
    
        parser.add_argument('--signal', action='store',
                            required=False, default= '../data/sampleSignal.csv',
                            help='three class signal data to add to the system')
        
        parser.add_argument('--market', action='store',
                           required=False, default= '../data/sampleMarket.csv',
                           help='market data every one second to add to the system')
    
        parser.add_argument('--dtformat',
                            required=False, default='%Y-%m-%d %H:%M:%S',
                            help='Format of datetime in input')
        return parser.parse_known_args()[0]
    
    class MyStrategy(bt.Strategy):
        MIO = 1000000
    
        def __init__(self):
                    
            self.market = self.datas[0]
            self.signal = self.datas[1]
        
            self.signalID = None
            self.marketID = None
            
            self.prev_prediction = None
            self.prediction = None
            
            ##multiplier for numeric calclation
            self._multmap = {'up' : 1, "neutral" : 0, "down" : -1}
                        
            return
    
        def updateIDs(self):
            self.prev_signalID = self.signalID
            self.prev_marketID = self.marketID
            
            self.signalID = self.signal.signalID[0]
            self.marketID = self.market.marketID[0]
            
            return
        
        def get_update_trigger(self):
            if self.signalID != self.prev_signalID:
                return 'signal'
            else:
                return 'market'
            
        def prob_to_prediction(self, up, neutral, down):
            prob = {'up' : up, 'neutral' : neutral, 'down' : down}
            return max(prob, key = prob.get)
            
        def update_prediction(self):
            self.prev_prediction = self.prediction
            self.prediction = self.prob_to_prediction(self.signal.up[0], self.signal.neutral[0], self.signal.down[0])
        
        def decide_action():
            if self.prev_prediction == None:
                actions = {'up' : 'buy', 'neutral' : 'nothing', 'down' : 'sell'}
                return actions[self.prediction]
            elif self.prev_prediction == "neutral":
                pass
            else:
                pass
            pass
        
        def check_trading_restrictions(self):
            
            return
            
        def next(self):
            
            #print('data : {}, signalID : {}, marketID : {}'.format(len(self), self.signal.signalID[0], self.market.marketID[0]))
            
            self.updateIDs()
            trigger = self.get_update_trigger()
            if trigger == 'signal':
                print('data : {}'.format(len(self)))
                print('This time {}'.format(bt.num2date(self.signal.datetime[0])))
                print('This up : {}, neutral : {}, down : {}'.format(self.signal.up[0], self.signal.neutral[0], self.signal.down[0]))
                print('Prev time {}'.format(bt.num2date(self.signal.datetime[-1])))
                print('Prev up : {}, neutral : {}, down : {}'.format(self.signal.up[-1], self.signal.neutral[-1], self.signal.down[-1]))
                print('2Prev time {}'.format(bt.num2date(self.signal.datetime[-2])))
                print('2Prev up : {}, neutral : {}, down : {}'.format(self.signal.up[-2], self.signal.neutral[-2], self.signal.down[-2]))
                print('/n')
                self.update_prediction()
                #print('prev prediction : {}'.format(self.prev_prediction))
                #print('this prediction : {}'.format(self.prediction))
                
            else: # i.e. trigger is market
                pass
            
            return 
    
    if __name__ == "__main__":
        signal = pd.read_csv("../data/sampleSignal.csv", index_col = "datetime")
        market = pd.read_csv("../data/sampleMarket.csv", index_col = "datetime")
        args = parse_args()
        signal = ThreeClassCSV(dataname=args.signal, dtformat=args.dtformat, timeframe = bt.TimeFrame.Seconds)
        market = MarketCSV(dataname=args.market, dtformat=args.dtformat, timeframe = bt.TimeFrame.Seconds)
        cerebro = bt.Cerebro()
        cerebro.adddata(market, name = 'market')
        cerebro.adddata(signal, name = 'signal')
    
        cerebro.addstrategy(MyStrategy)
        cerebro.run()
    
    

    Running above, I get result below.
    The bold is the unexpected data. in the first and second, signal.datetime[-1] and signal.datetime[-2] does not exist but it returns the future data, it looks like I have information leakage here..

    data : 1
    This time 2017-05-28 21:00:00
    This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
    Prev time 2017-05-29 20:59:50
    Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
    2Prev time 2017-05-29 20:59:40
    2Prev up : 0.6961796877, neutral : 0.262589821, down : 0.04123049131
    /n
    data : 11
    This time 2017-05-28 21:00:10
    This up : 0.3228381844, neutral : 0.4956512147, down : 0.1815106009
    Prev time 2017-05-28 21:00:00
    Prev up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
    2Prev time 2017-05-29 20:59:50
    2Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
    /n
    data : 21
    This time 2017-05-28 21:00:20
    This up : 0.3389022754, neutral : 0.3107556667, down : 0.350342058
    Prev time 2017-05-28 21:00:10
    Prev up : 0.3228381844, neutral : 0.4956512147, down : 0.1815106009
    2Prev time 2017-05-28 21:00:00
    2Prev up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162


  • administrators

    @Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:

    I printed out the datas[1] (i.e. signal) and it apparently refers to future data.

    You probably want to explain what you mean with "future"

    @Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:

    data : 1
    This time 2017-05-28 21:00:00
    This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
    Prev time 2017-05-29 20:59:50
    Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
    2Prev time 2017-05-29 20:59:40

    The timestamps go 10-seconds into the past for each index (0, -1, -2)

    @Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:

    print('This time {}'.format(bt.num2date(self.signal.datetime[0])))
    

    Rather than doing that you may want to use the proper format

    print('This time {}'.format(self.signal.datetime.datetime(0))`
    


  • @backtrader,
    Thanks for your quick response and for the advice for proper way of printing datetime data.

    Let me explain what I mean with future, I mean it is not past data but the data beyond the point in time.

    data : 1
    This time 2017-05-28 21:00:00
    This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
    Prev time 2017-05-29 20:59:50
    Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
    2Prev time 2017-05-29 20:59:40

    The datetime when next() is called is 2017-05-28 21:00:00 as above shows.
    but datetime.datetime(-1) is 2017-05-29 20:59:50, which is 10 seconds before 21:00 of the next day.
    in the same manner, datetime.datetime(-2) is 2017-05-29 20:59:40, which is 2 *10 seconds before 21:00 of the next day.

    This is what I mean by future.
    In this case, I would expect null because there is no data in the past when next is called.

    Thank you very much for your kind help.


  • administrators

    I read too quickly. The issue means that you are trying to look too far into the past and Python index arithmetic wraps around when the index is too negative to give you the end of the array.

    Run cerebro with preload=False and you will see that an exception is raised (because there will be no data to access)

    You can only really access as much data as you have. Use for example len on the data feeds to check it. Or introduce an indicator which adds a constraint to only get you in next in the strategy after the needed amount of bars.



  • @backtrader

    Thank you very much for your great help, that is very helpful.


Log in to reply
 

});