Confusion about resampledata() and creating new datas[]



  • I think this is one area of the system that has caused me some confusion so I would like to get some clarity here and hope to help someone else out in the process.

    I've seen in the samples several different uses of cerebro.resampledata().

    # first creating the data feed
    data0 = ibstore.getdata(dataname='FOO', timeframe=bt.TimeFrame.Seconds, compression=5)
    
    # resample data0 to bt.TimeFrame.Minutes
    cerebro.resampledata(data0, timeframe=bt.TimeFrame.Minutes, compression=1 )
    
    # resample data0 to new data feed in bt.TimeFrame.Days timeframe
    data1 = cerebro.resampledata(dataname=data0, timeframe=bt.TimeFrame.Days, compression=1)
    

    Questions are:

    1. Are these both legitimate format for .resampledata()?
    2. Is the named parameter dataname= optional then?
    3. Is the original data0 no longer available in 5 Second timeframe (I am assuming so since there was no assignment to new datas name)
    4. Do each of these assignments now appear in the datas[] array?

    Here is code for .resampledata object:

        def resampledata(self, dataname, name=None, **kwargs):
            '''
            Adds a ``Data Feed`` to be resample by the system
    
            If ``name`` is not None it will be put into ``data._name`` which is
            meant for decoration/plotting purposes.
    
            Any other kwargs like ``timeframe``, ``compression``, ``todate`` which
            are supported by the resample filter will be passed transparently
            '''
            if any(dataname is x for x in self.datas):
                dataname = dataname.clone()
    
            dataname.resample(**kwargs)
            self.adddata(dataname, name=name)
            self._doreplay = True
    

  • administrators

    # resample data0 to new data feed in bt.TimeFrame.Days timeframe
    data1 = cerebro.resampledata(dataname=data0, timeframe=bt.TimeFrame.Days, compression=1)
    

    This is a typo. resampledata returns nothing (i.e.: in Python a None will be assigned). Found this in one sample and in a couple docstrings for the PivotPoint like indicators. In any case, an additional reference to data1 outside of cerebro has no expected use cases.

    Putting that aside.

    1. Are these both legitimate format for .resampledata()?

    Yes, the uses are legitimate. You should be able to resample data0 as many times as needed.

    A very quick consideration could see a problem with colliding historical downloads.

    1. Is the named parameter dataname= optional then?

    The signature of the method is: def resampledata(self, dataname, name=None, **kwargs). You may pass the 1st argument unnamed or use its actual name, which isdataname`

    1. Is the original data0 no longer available in 5 Second timeframe (I am assuming so since there was no assignment to new datas name)

    data0 has not been directly inserted in the system. The object will not die, because references to it are kept in the internal objects created via resampledata, but it has no direct use in the system. The timeframe and compression parameters are indications for the historical download, because IB does not deliver complete bars in real-time. If you don't resample the data, the ticks received will be directly given to you.

    This should be explained in the reference for IB (https://www.backtrader.com/docu/live/ib/ib.html) specifically in this paragraph:

    Note
    Take into account that the final timeframe/compression combination taken into account may not be the one specified during data feed creation but during insertion in the system. See the following example:

    data = ibstore.getdata(dataname='EUR.USD-CASH-IDEALPRO',
                           timeframe=bt.TimeFrame.Seconds, compression=5)
    
    cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=2)
    

    As should now be clear, the final timeframe/compression combination taken into account is Minutes/2

    And the final question

    1. Do each of these assignments now appear in the datas[] array?

    Each resampledata call has added a data feed to the self.datas[], so you should have 2.


  • administrators

    @backtrader said in Confusion about resampledata() and creating new datas[]:

    1. Are these both legitimate format for .resampledata()?

    Yes, the uses are legitimate. You should be able to resample data0 as many times as needed.
    A very quick consideration could see a problem with colliding historical downloads.

    The quick consideration concludes that even if the use case presented above is legitimate, things like how backfilling works with IB, the following would be more appropriate:

    data0 = ibstore.getdata('FOO')  # data feed for 1-minute resampling
    cerebro.resampledata(data0, timeframe=bt.TimeFrame.Minutes, compression=1)
    
    data1 = ibstore.getdata('FOO')  # data feed for 1-day resampling
    cerebro.resampledata(data1, timeframe=bt.TimeFrame.Days, compression=1)
    

    With this use case (untested, it is just a quick snippet), the associated historical downloads for each of the timeframes in the resampled data feeds, should have the proper amount of backfilling.



  • One question this raises is what would be the data timeframe that the ibstore would request from IB? Having some control of that would be nice, but if it implicitly requests the same frequency as the fastest resample request on that same data0, that would be good default behavior.

    I think the thing that causes me some problems here is that I don't like the implicit naming of these resampled data sources. I would much prefer a parameter that allows me to specify the object name, or to be able to do an assignment to the resampled object.


  • administrators

    In your initial scenario in which data0 is resampled twice the historical data would be requested in the timeframe passed to the latest of the resampledata actions (in that case it would be TimeFrame.Days)

    In the latest scenario proposed, in which two data feeds are created for ticker FOO and then each data feed is resampled idenpendently, each resampledata would have its own historical data in the expected timeframe



  • Another approach that may be a bit cleaner is to remove the requirement to call .resampledata() on these data feeds and instead, handle that behind the scenes. Only requiring a .resampledata() if you really need that previously created feed in another timeframe. Then allowing the assignment of data object from the .resampledata().

    Like so:

    data_FOO_min = ibstore.getdata('FOO', timeframe=bt.TimeFrame.Minutes, compression=1) 
    
    data_FOO_day = cerebro.resampledata(data_FOO_min, timeframe=bt.TimeFrame.Days, compression=1)
    

    I find that to be more intuitive and I can be more explicit about the data object that I am creating. It is clear what the source of the data is for data_FOO_day and I can even use more descriptive names for the data objects.


  • administrators

    This is implemented when possible (see for example the live feed for VisualChart), but the peculiarities of IB wouldn't allow that working model

    IB clearly states it is not a data provider and that's the reason for the many limitations in historical data download.

    A commercial solution like SierraChart tries to overcome the limitations by several historical requests with 1-second bar sizes for intraday resolutions (which can take ages for some backfillings and requires splitting the requests) and 1-day bar sizes for daily and greater resolutions (again splitting requests)

    But at least 2 IB feeds are created for the same ticker: one for intraday and one for daily.

    The SierraChart engineers/developers strongly recommend using some other data provider, given the many limitations.



  • Yes, don't get me started on how awful the IB service is. :zipper_mouth:

    I've struggled to find another solution but would jump if I did... data seems to be a continuous challenge in this pursuit.

    On a somewhat related note, have you seen this? https://github.com/ranaroussi/qtpylib/blob/master/qtpylib/blotter.py

    Found this guy from some of the links you put on the backtrader README. The blotter works pretty well. Would be an interesting feature of perhaps an ibstore that could pull from a locally persisted database with blotter or something like it constantly running to clean and maintain the data store. I'd prefer storage on something like fluxdb over his choice of mysql, but seems all doable.


  • administrators

    The idea of capturing your own ticks is in theory sound, but has always practical problems. And that's why companies that collect and curate data make money out of it.

    Collecting ticks or minute bars or anything else with backtrader is just a python statement away (see for example the trace that run with 2 instruments collecting 1-minute bars during approximately 35 minutes)

    The data points could have been stored somewhere (hierarchically or not) instead of simply having printed them out.


Log in to reply
 

Looks like your connection to Backtrader Community was lost, please wait while we try to reconnect.