Confusion about resampledata() and creating new datas[]
-
I think this is one area of the system that has caused me some confusion so I would like to get some clarity here and hope to help someone else out in the process.
I've seen in the samples several different uses of
cerebro.resampledata()
.# first creating the data feed data0 = ibstore.getdata(dataname='FOO', timeframe=bt.TimeFrame.Seconds, compression=5) # resample data0 to bt.TimeFrame.Minutes cerebro.resampledata(data0, timeframe=bt.TimeFrame.Minutes, compression=1 ) # resample data0 to new data feed in bt.TimeFrame.Days timeframe data1 = cerebro.resampledata(dataname=data0, timeframe=bt.TimeFrame.Days, compression=1)
Questions are:
- Are these both legitimate format for
.resampledata()
? - Is the named parameter
dataname=
optional then? - Is the original
data0
no longer available in 5 Second timeframe (I am assuming so since there was no assignment to new datas name) - Do each of these assignments now appear in the
datas[]
array?
Here is code for
.resampledata
object:def resampledata(self, dataname, name=None, **kwargs): ''' Adds a ``Data Feed`` to be resample by the system If ``name`` is not None it will be put into ``data._name`` which is meant for decoration/plotting purposes. Any other kwargs like ``timeframe``, ``compression``, ``todate`` which are supported by the resample filter will be passed transparently ''' if any(dataname is x for x in self.datas): dataname = dataname.clone() dataname.resample(**kwargs) self.adddata(dataname, name=name) self._doreplay = True
- Are these both legitimate format for
-
# resample data0 to new data feed in bt.TimeFrame.Days timeframe data1 = cerebro.resampledata(dataname=data0, timeframe=bt.TimeFrame.Days, compression=1)
This is a typo.
resampledata
returns nothing (i.e.: in Python aNone
will be assigned). Found this in one sample and in a couple docstrings for thePivotPoint
like indicators. In any case, an additional reference todata1
outside ofcerebro
has no expected use cases.Putting that aside.
- Are these both legitimate format for .resampledata()?
Yes, the uses are legitimate. You should be able to resample
data0
as many times as needed.A very quick consideration could see a problem with colliding historical downloads.
- Is the named parameter dataname= optional then?
The signature of the method is:
def resampledata(self, dataname, name=None, **kwargs). You may pass the 1st argument unnamed or use its actual name, which is
dataname`- Is the original data0 no longer available in 5 Second timeframe (I am assuming so since there was no assignment to new datas name)
data0
has not been directly inserted in the system. The object will not die, because references to it are kept in the internal objects created viaresampledata
, but it has no direct use in the system. Thetimeframe
andcompression
parameters are indications for the historical download, becauseIB
does not deliver complete bars in real-time. If you don't resample the data, the ticks received will be directly given to you.This should be explained in the reference for
IB
(https://www.backtrader.com/docu/live/ib/ib.html) specifically in this paragraph:Note
Take into account that the final timeframe/compression combination taken into account may not be the one specified during data feed creation but during insertion in the system. See the following example:data = ibstore.getdata(dataname='EUR.USD-CASH-IDEALPRO', timeframe=bt.TimeFrame.Seconds, compression=5) cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=2)
As should now be clear, the final timeframe/compression combination taken into account is Minutes/2
And the final question
- Do each of these assignments now appear in the datas[] array?
Each
resampledata
call has added a data feed to theself.datas[]
, so you should have 2. -
@backtrader said in Confusion about resampledata() and creating new datas[]:
- Are these both legitimate format for .resampledata()?
Yes, the uses are legitimate. You should be able to resample data0 as many times as needed.
A very quick consideration could see a problem with colliding historical downloads.The quick consideration concludes that even if the use case presented above is legitimate, things like how backfilling works with IB, the following would be more appropriate:
data0 = ibstore.getdata('FOO') # data feed for 1-minute resampling cerebro.resampledata(data0, timeframe=bt.TimeFrame.Minutes, compression=1) data1 = ibstore.getdata('FOO') # data feed for 1-day resampling cerebro.resampledata(data1, timeframe=bt.TimeFrame.Days, compression=1)
With this use case (untested, it is just a quick snippet), the associated historical downloads for each of the timeframes in the resampled data feeds, should have the proper amount of backfilling.
-
One question this raises is what would be the data timeframe that the ibstore would request from IB? Having some control of that would be nice, but if it implicitly requests the same frequency as the fastest resample request on that same
data0
, that would be good default behavior.I think the thing that causes me some problems here is that I don't like the implicit naming of these resampled data sources. I would much prefer a parameter that allows me to specify the object name, or to be able to do an assignment to the resampled object.
-
In your initial scenario in which
data0
is resampled twice the historical data would be requested in thetimeframe
passed to the latest of theresampledata
actions (in that case it would beTimeFrame.Days
)In the latest scenario proposed, in which two data feeds are created for ticker
FOO
and then each data feed is resampled idenpendently, eachresampledata
would have its own historical data in the expectedtimeframe
-
Another approach that may be a bit cleaner is to remove the requirement to call
.resampledata()
on these data feeds and instead, handle that behind the scenes. Only requiring a.resampledata()
if you really need that previously created feed in another timeframe. Then allowing the assignment of data object from the.resampledata()
.Like so:
data_FOO_min = ibstore.getdata('FOO', timeframe=bt.TimeFrame.Minutes, compression=1) data_FOO_day = cerebro.resampledata(data_FOO_min, timeframe=bt.TimeFrame.Days, compression=1)
I find that to be more intuitive and I can be more explicit about the data object that I am creating. It is clear what the source of the data is for
data_FOO_day
and I can even use more descriptive names for the data objects. -
This is implemented when possible (see for example the live feed for
VisualChart
), but the peculiarities of IB wouldn't allow that working modelIB clearly states it is not a data provider and that's the reason for the many limitations in historical data download.
A commercial solution like
SierraChart
tries to overcome the limitations by several historical requests with1-second
bar sizes for intraday resolutions (which can take ages for some backfillings and requires splitting the requests) and1-day
bar sizes for daily and greater resolutions (again splitting requests)But at least 2 IB feeds are created for the same ticker: one for intraday and one for daily.
The
SierraChart
engineers/developers strongly recommend using some other data provider, given the many limitations. -
Yes, don't get me started on how awful the IB service is. :zipper_mouth:
I've struggled to find another solution but would jump if I did... data seems to be a continuous challenge in this pursuit.
On a somewhat related note, have you seen this? https://github.com/ranaroussi/qtpylib/blob/master/qtpylib/blotter.py
Found this guy from some of the links you put on the backtrader README. The blotter works pretty well. Would be an interesting feature of perhaps an
ibstore
that could pull from a locally persisted database with blotter or something like it constantly running to clean and maintain the data store. I'd prefer storage on something like fluxdb over his choice of mysql, but seems all doable. -
The idea of capturing your own ticks is in theory sound, but has always practical problems. And that's why companies that collect and curate data make money out of it.
Collecting ticks or minute bars or anything else with backtrader is just a python statement away (see for example the trace that run with 2 instruments collecting 1-minute bars during approximately 35 minutes)
The data points could have been stored somewhere (hierarchically or not) instead of simply having printed them out.