For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Problem with loading multiple dataseries with the Panda feed



  • Hi,

    I have been experimenting loading multiple datafeeds with different lengths (futures) successfully using the GenericCSV class but when trying with the PandasData feed it seems that I have a synchronisation problem (bars are replayed twice).

    With the csv class, I load first a dummy reference csv which starts before and ends after all the futures data, I then load each data serie, see below:

        # Need at least one extra line somehow - so reference data needs to start on the 17th in the file
        data = custom_data(os.path.join(oc.cfg['default']['data'], 'Reference.txt'), '2008-11-18', '2009-11-13')
        cerebro.adddata(data)
        # Add Data to Cerebro
        fc = occ.FutureChain(stem, oci.FutureType.Spread)
        fc.initialize_contracts(occ.Status.Expired)
        for ct in fc.contracts[0:17]:
            end = ctrmth[ctrmth['CtrMth'] == oci.int_maturity(ct[-3:], True)].iloc[0][ref]
            data = omega_data(ct, end)
            cerebro.adddata(data)
    

    The omega_data function just loads the data using the GenericCSV class.

    Below is how I load the data using Pandas:

        # Reference Data
        data = custom_data(os.path.join(oc.cfg['default']['data'], 'Reference.txt'), '2008-11-18', '2009-11-14')
        cerebro.adddata(data)
        # Add Data to Cerebro
        fc = occ.FutureChain(stem, oci.FutureType.Spread)
        fc.initialize_contracts(occ.Status.Expired, initialize_data=True)
        for ct in fc.contracts[0:2]:
            print(ct)
            data = bt.feeds.PandasData(dataname=fc.data[ct])
            cerebro.adddata(data)
    

    See below how I use the data in next (I also use prenext as suggested in another post):

        rdata = self.datas[0]  # Reference
        fdata = [(i, dta) for i, dta in enumerate(self.datas[1:]) if len(dta) and self.chain.last_date(i) >= self.date(0)]
        self.debug('Ref: {} Series: {}'.format(self.date(0), ['(Index: {} - Ticker: {} - Date: {})'.format(fdata[i][0], self.chain.ticker(i), fdata[i][1].datetime.date(0)) for i, dta in enumerate(fdata)]))
    

    Below is an example output with the csv loading:
    2018-02-22 15:07:22,561 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
    2018-02-22 15:07:22,564 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
    2018-02-22 15:07:22,567 - omega.raid.spreading - DEBUG - 2009-05-04 - Ref: 2009-05-04 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']

    And below is an example output with the pandas loading:
    2018-02-22 15:24:59,020 - omega.raid.spreading - DEBUG - 2009-04-29 - Ref: 2009-04-29 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
    2018-02-22 15:24:59,024 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
    2018-02-22 15:24:59,027 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
    2018-02-22 15:24:59,030 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-01)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-01)']
    2018-02-22 15:24:59,033 - omega.raid.spreading - DEBUG - 2009-05-01 - Ref: 2009-05-01 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']
    2018-02-22 15:24:59,037 - omega.raid.spreading - DEBUG - 2009-05-04 - Ref: 2009-05-04 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-05-04)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-05-04)']

    It seems that the bars are played twice somehow.

    Does anyone have any ideas?

    Thanks!


  • administrators

    It would seem that you simply take the available value for the data feed, even if you try not to.

    @laurent-michelizza said in Problem with loading multiple dataseries with the Panda feed:

    2018-02-22 15:07:22,561 - omega.raid.spreading - DEBUG - 2009-04-30 - Ref: 2009-04-30 Series: ['(Index: 0 - Ticker: LBS2N09 - Date: 2009-04-30)', '(Index: 1 - Ticker: LBS2U09 - Date: 2009-04-30)']
    

    Your timestamps seems to have a resolution of milliseconds.

     fdata = [(i, dta) for i, dta in enumerate(self.datas[1:]) if len(dta) and self.chain.last_date(i) >= self.date(0)]
    

    But you compare against self.date(0) which has always the following milliseconds: 000. The comparison for data feeds with positive len will always (except in corner day-wrapping cases) be True



  • Hi,

    Thanks for the answer!

    It seems that my filtering works well as I actually have 52 contracts loaded (hopefully it is not too much?) and I have checked this carefully and I do only see the data when it should be there in comparison to the dummy reference.

    The main problem here, is that the next seems to be triggered twice when I load the data using pandas instead of csv which is odd. So I was wondering if anyone experienced similar issues?

    Thanks



  • Hi,

    So I have finally understood what I did wrong, so posting this here.

    So to recap what I do, I am loading data for a future (individual contract months, so for example: H8, M8, U8, etc...) and to keep everything synchronized, I load a dummy reference in data0 spanning the length of all the contract months.

    I originally did this loading the data using a custom CSV loader but then replaced it with pandas but only to load the contract months and left the reference with the CSV which seems to fix the issue.

    Wrong way:

            data = orb.custom_data(path_reference, '2008-01-01', '2010-12-31')  # CSV Loading
            cerebro.adddata(data)
            for ct in fc.contracts[0:2]:
                data = bt.feeds.PandasData(dataname=fc.data[ct])
                cerebro.adddata(data)
    

    Output:
    18-02-28 10:42:33,585 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-23)', '(Index: 1 - Date: 2009-06-23)']
    2018-02-28 10:42:33,591 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
    2018-02-28 10:42:33,594 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
    2018-02-28 10:42:33,596 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']
    2018-02-28 10:42:33,598 - main - DEBUG - 2009-06-25 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']
    2018-02-28 10:42:33,601 - main - DEBUG - 2009-06-26 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']

    New correct way:

            dfr = pd.read_csv(path_reference, sep=',', parse_dates=True, header=None, names=['Open', 'High', 'Low', 'Close', 'Volume', 'OI'])
            data = bt.feeds.PandasData(dataname=dfr)
            cerebro.adddata(data)
            for ct in fc.contracts[0:2]:
                data = bt.feeds.PandasData(dataname=fc.data[ct])
                cerebro.adddata(data)
    

    Output:
    2018-02-28 10:29:32,704 - main - DEBUG - 2009-06-23 - ['(Index: 0 - Date: 2009-06-23)', '(Index: 1 - Date: 2009-06-23)']
    2018-02-28 10:29:32,706 - main - DEBUG - 2009-06-24 - ['(Index: 0 - Date: 2009-06-24)', '(Index: 1 - Date: 2009-06-24)']
    2018-02-28 10:29:32,709 - main - DEBUG - 2009-06-25 - ['(Index: 0 - Date: 2009-06-25)', '(Index: 1 - Date: 2009-06-25)']

    Now each bar is played only once.

    PS: Is anyone interested in code to load and backtest futures contract months separately?

    Thanks


  • administrators

    Your approach is good, but I have one question.

    • Why don't you use the rollover functionality which is already in backtrader?

    I can imagine that you want to trade overlapping futures simultaneously, but if you don't, you could simply let cerebro do it for you.



  • Thanks.

    My strategy is based on spreads and not outrights so a rollover would not mean much in this instance and I need to trade them separately.
    I am actually considering adding the rollover outright serie to replace the dummy reference and to have the outright price as a reference.