Problem with loading multiple dataseries with the Panda feed
-
Hi,
I have been experimenting loading multiple datafeeds with different lengths (futures) successfully using the GenericCSV class but when trying with the PandasData feed it seems that I have a synchronisation problem (bars are replayed twice).
With the csv class, I load first a dummy reference csv which starts before and ends after all the futures data, I then load each data serie, see below:
# Need at least one extra line somehow - so reference data needs to start on the 17th in the file data = custom_data(os.path.join(oc.cfg['default']['data'], 'Reference.txt'), '2008-11-18', '2009-11-13') cerebro.adddata(data) # Add Data to Cerebro fc = occ.FutureChain(stem, oci.FutureType.Spread) fc.initialize_contracts(occ.Status.Expired) for ct in fc.contracts[0:17]: end = ctrmth[ctrmth['CtrMth'] == oci.int_maturity(ct[-3:], True)].iloc[0][ref] data = omega_data(ct, end) cerebro.adddata(data)
The omega_data function just loads the data using the GenericCSV class.
Below is how I load the data using Pandas:
# Reference Data data = custom_data(os.path.join(oc.cfg['default']['data'], 'Reference.txt'), '2008-11-18', '2009-11-14') cerebro.adddata(data) # Add Data to Cerebro fc = occ.FutureChain(stem, oci.FutureType.Spread) fc.initialize_contracts(occ.Status.Expired, initialize_data=True) for ct in fc.contracts[0:2]: print(ct) data = bt.feeds.PandasData(dataname=fc.data[ct]) cerebro.adddata(data)
See below how I use the data in next (I also use prenext as suggested in another post):
rdata = self.datas[0] # Reference fdata = [(i, dta) for i, dta in enumerate(self.datas[1:]) if len(dta) and self.chain.last_date(i) >= self.date(0)] self.debug('Ref: {} Series: {}'.format(self.date(0), ['(Index: {} - Ticker: {} - Date: {})'.format(fdata[i][0], self.chain.ticker(i), fdata[i][1].datetime.date(0)) for i, dta in enumerate(fdata)]))
Below is an example output with the csv loading:
2018-02-22 15:07:22,561 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
2018-02-22 15:07:22,564 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
2018-02-22 15:07:22,567 - omega.raid.spreading - DEBUG - 2009-05-04 - Ref: 2009-05-04 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']And below is an example output with the pandas loading:
2018-02-22 15:24:59,020 - omega.raid.spreading - DEBUG - 2009-04-29 - Ref: 2009-04-29 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
2018-02-22 15:24:59,024 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
2018-02-22 15:24:59,027 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
2018-02-22 15:24:59,030 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
2018-02-22 15:24:59,033 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']
2018-02-22 15:24:59,037 - omega.raid.spreading - DEBUG - 2009-05-04 - Ref: 2009-05-04 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']It seems that the bars are played twice somehow.
Does anyone have any ideas?
Thanks!
-
It would seem that you simply take the available value for the data feed, even if you try not to.
@laurent-michelizza said in Problem with loading multiple dataseries with the Panda feed:
2018-02-22 15:07:22,561 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
Your timestamps seems to have a resolution of milliseconds.
fdata = [(i, dta) for i, dta in enumerate(self.datas[1:]) if len(dta) and self.chain.last_date(i) >= self.date(0)]
But you compare against
self.date(0)
which has always the following milliseconds:000
. The comparison for data feeds with positivelen
will always (except in corner day-wrapping cases) beTrue
-
Hi,
Thanks for the answer!
It seems that my filtering works well as I actually have 52 contracts loaded (hopefully it is not too much?) and I have checked this carefully and I do only see the data when it should be there in comparison to the dummy reference.
The main problem here, is that the next seems to be triggered twice when I load the data using pandas instead of csv which is odd. So I was wondering if anyone experienced similar issues?
Thanks
-
Hi,
So I have finally understood what I did wrong, so posting this here.
So to recap what I do, I am loading data for a future (individual contract months, so for example: H8, M8, U8, etc...) and to keep everything synchronized, I load a dummy reference in data0 spanning the length of all the contract months.
I originally did this loading the data using a custom CSV loader but then replaced it with pandas but only to load the contract months and left the reference with the CSV which seems to fix the issue.
Wrong way:
data = orb.custom_data(path_reference, '2008-01-01', '2010-12-31') # CSV Loading cerebro.adddata(data) for ct in fc.contracts[0:2]: data = bt.feeds.PandasData(dataname=fc.data[ct]) cerebro.adddata(data)
Output:
18-02-28 10:42:33,585 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-23)', '(Index: 1 - Date: 2009-06-23)']
2018-02-28 10:42:33,591 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
2018-02-28 10:42:33,594 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
2018-02-28 10:42:33,596 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']
2018-02-28 10:42:33,598 - main - DEBUG - 2009-06-25 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']
2018-02-28 10:42:33,601 - main - DEBUG - 2009-06-26 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']New correct way:
dfr = pd.read_csv(path_reference, sep=',', parse_dates=True, header=None, names=['Open', 'High', 'Low', 'Close', 'Volume', 'OI']) data = bt.feeds.PandasData(dataname=dfr) cerebro.adddata(data) for ct in fc.contracts[0:2]: data = bt.feeds.PandasData(dataname=fc.data[ct]) cerebro.adddata(data)
Output:
2018-02-28 10:29:32,704 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-23)', '(Index: 1 - Date: 2009-06-23)']
2018-02-28 10:29:32,706 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
2018-02-28 10:29:32,709 - main - DEBUG - 2009-06-25 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']Now each bar is played only once.
PS: Is anyone interested in code to load and backtest futures contract months separately?
Thanks
-
Your approach is good, but I have one question.
- Why don't you use the
rollover
functionality which is already in backtrader?
I can imagine that you want to trade overlapping futures simultaneously, but if you don't, you could simply let
cerebro
do it for you. - Why don't you use the
-
Thanks.
My strategy is based on spreads and not outrights so a rollover would not mean much in this instance and I need to trade them separately.
I am actually considering adding the rollover outright serie to replace the dummy reference and to have the outright price as a reference.