Information leakage ?? When I use multiple data feeds
-
I found strange behavior (information leakage in next()) when I used multiple feeds as input.
Backtrader version : 1.9.74.123I have two feeds one is signal the other is market. ( signal is produced by other system )
- market is observed every 1 second (Starting from 2017.05.27 21:00:00 ending at 2018.05.28 20:59:59)
and it contains, datetime, up, neutral, down and signalID columns. - signal is observed every 10 seconds (Starting from 2017.05.27 21:00:00 ending at 2018.05.28 20:59:50)
and it contains datetime, bid, ask and marketID columns.
I printed out the datas[1] (i.e. signal) and it apparently refers to future data.
Where did I get wrong ?class ThreeClassCSV(GenericCSVData): linesoverride = True # discard usual OHLC structure # datetime must be present and last lines = ('up', 'neutral', 'down', 'signalID','datetime') # datetime (always 1st) and then the desired order for params = ( ('datetime', 0), # inherited from parent class ('up', 1), ('neutral', 2), ('down', 3), ('signalID', 4), ) class MarketCSV(GenericCSVData): linesoverride = True lines = ('bid', 'ask', 'spread', 'marketID', 'datetime') params = ( ('datetime', 0), # inherited from parent class ('bid', 1), ('ask', 2), ('spread', 3), ('marketID', 4), ) def parse_args(): parser = argparse.ArgumentParser( description='Three class probability Line Hierarchy', formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) parser.add_argument('--signal', action='store', required=False, default= '../data/sampleSignal.csv', help='three class signal data to add to the system') parser.add_argument('--market', action='store', required=False, default= '../data/sampleMarket.csv', help='market data every one second to add to the system') parser.add_argument('--dtformat', required=False, default='%Y-%m-%d %H:%M:%S', help='Format of datetime in input') return parser.parse_known_args()[0] class MyStrategy(bt.Strategy): MIO = 1000000 def __init__(self): self.market = self.datas[0] self.signal = self.datas[1] self.signalID = None self.marketID = None self.prev_prediction = None self.prediction = None ##multiplier for numeric calclation self._multmap = {'up' : 1, "neutral" : 0, "down" : -1} return def updateIDs(self): self.prev_signalID = self.signalID self.prev_marketID = self.marketID self.signalID = self.signal.signalID[0] self.marketID = self.market.marketID[0] return def get_update_trigger(self): if self.signalID != self.prev_signalID: return 'signal' else: return 'market' def prob_to_prediction(self, up, neutral, down): prob = {'up' : up, 'neutral' : neutral, 'down' : down} return max(prob, key = prob.get) def update_prediction(self): self.prev_prediction = self.prediction self.prediction = self.prob_to_prediction(self.signal.up[0], self.signal.neutral[0], self.signal.down[0]) def decide_action(): if self.prev_prediction == None: actions = {'up' : 'buy', 'neutral' : 'nothing', 'down' : 'sell'} return actions[self.prediction] elif self.prev_prediction == "neutral": pass else: pass pass def check_trading_restrictions(self): return def next(self): #print('data : {}, signalID : {}, marketID : {}'.format(len(self), self.signal.signalID[0], self.market.marketID[0])) self.updateIDs() trigger = self.get_update_trigger() if trigger == 'signal': print('data : {}'.format(len(self))) print('This time {}'.format(bt.num2date(self.signal.datetime[0]))) print('This up : {}, neutral : {}, down : {}'.format(self.signal.up[0], self.signal.neutral[0], self.signal.down[0])) print('Prev time {}'.format(bt.num2date(self.signal.datetime[-1]))) print('Prev up : {}, neutral : {}, down : {}'.format(self.signal.up[-1], self.signal.neutral[-1], self.signal.down[-1])) print('2Prev time {}'.format(bt.num2date(self.signal.datetime[-2]))) print('2Prev up : {}, neutral : {}, down : {}'.format(self.signal.up[-2], self.signal.neutral[-2], self.signal.down[-2])) print('/n') self.update_prediction() #print('prev prediction : {}'.format(self.prev_prediction)) #print('this prediction : {}'.format(self.prediction)) else: # i.e. trigger is market pass return if __name__ == "__main__": signal = pd.read_csv("../data/sampleSignal.csv", index_col = "datetime") market = pd.read_csv("../data/sampleMarket.csv", index_col = "datetime") args = parse_args() signal = ThreeClassCSV(dataname=args.signal, dtformat=args.dtformat, timeframe = bt.TimeFrame.Seconds) market = MarketCSV(dataname=args.market, dtformat=args.dtformat, timeframe = bt.TimeFrame.Seconds) cerebro = bt.Cerebro() cerebro.adddata(market, name = 'market') cerebro.adddata(signal, name = 'signal') cerebro.addstrategy(MyStrategy) cerebro.run()
Running above, I get result below.
The bold is the unexpected data. in the first and second, signal.datetime[-1] and signal.datetime[-2] does not exist but it returns the future data, it looks like I have information leakage here..data : 1
This time 2017-05-28 21:00:00
This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
Prev time 2017-05-29 20:59:50
Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
2Prev time 2017-05-29 20:59:40
2Prev up : 0.6961796877, neutral : 0.262589821, down : 0.04123049131
/n
data : 11
This time 2017-05-28 21:00:10
This up : 0.3228381844, neutral : 0.4956512147, down : 0.1815106009
Prev time 2017-05-28 21:00:00
Prev up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
2Prev time 2017-05-29 20:59:50
2Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
/n
data : 21
This time 2017-05-28 21:00:20
This up : 0.3389022754, neutral : 0.3107556667, down : 0.350342058
Prev time 2017-05-28 21:00:10
Prev up : 0.3228381844, neutral : 0.4956512147, down : 0.1815106009
2Prev time 2017-05-28 21:00:00
2Prev up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162 - market is observed every 1 second (Starting from 2017.05.27 21:00:00 ending at 2018.05.28 20:59:59)
-
@Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:
I printed out the datas[1] (i.e. signal) and it apparently refers to future data.
You probably want to explain what you mean with "future"
@Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:
data : 1
This time 2017-05-28 21:00:00
This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
Prev time 2017-05-29 20:59:50
Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
2Prev time 2017-05-29 20:59:40The timestamps go
10-seconds
into the past for each index (0
,-1
,-2
)@Daichi-Sugiura said in Information leakage ?? When I use multiple data feeds:
print('This time {}'.format(bt.num2date(self.signal.datetime[0])))
Rather than doing that you may want to use the proper format
print('This time {}'.format(self.signal.datetime.datetime(0))`
-
@backtrader,
Thanks for your quick response and for the advice for proper way of printing datetime data.Let me explain what I mean with future, I mean it is not past data but the data beyond the point in time.
data : 1
This time 2017-05-28 21:00:00
This up : 0.3404663204, neutral : 0.4591489635, down : 0.2003847162
Prev time 2017-05-29 20:59:50
Prev up : 0.3446146489, neutral : 0.2664228591, down : 0.388962492
2Prev time 2017-05-29 20:59:40The datetime when next() is called is 2017-05-28 21:00:00 as above shows.
but datetime.datetime(-1) is 2017-05-29 20:59:50, which is 10 seconds before 21:00 of the next day.
in the same manner, datetime.datetime(-2) is 2017-05-29 20:59:40, which is 2 *10 seconds before 21:00 of the next day.This is what I mean by future.
In this case, I would expect null because there is no data in the past when next is called.Thank you very much for your kind help.
-
I read too quickly. The issue means that you are trying to look too far into the past and Python index arithmetic wraps around when the index is too negative to give you the end of the array.
Run cerebro with
preload=False
and you will see that an exception is raised (because there will be no data to access)You can only really access as much data as you have. Use for example
len
on the data feeds to check it. Or introduce an indicator which adds a constraint to only get you innext
in the strategy after the needed amount of bars. -
Thank you very much for your great help, that is very helpful.