Multi-asset Datafeed Standardization
Hi, I'm using bloomberg api to pull in data & struggling to integrate multi-asset datafeeds;
I am pulling several datas individually through an API. They have different counts of dates & this is causing the backtest to crash (i.e. currencies trade 24/7 whilst stocks only 5 days a week). I know zipline automatically solves this problem by backfilling data per day (i.e. if you pass a monthly & daily data, it will auto-reconfigure the monthly into a daily sample for all days available). I have looked at the documentation regarding the 'filler' which is quite easy to do in a pandas dataframe, but how would you do this for multiple datas? (i.e. 1st with 200 days, 2nd with 190 days, 3rd with 210 days).
With regards to a table data where the first column is a date & the other 3 columns are equity prices, how do you extract each individual price for each security whilst still referencing the date in the first column to each data?
IBM US EQUITY CSCO US EQUITY MSFT US EQUITY date 2014-01-02 185.53 22.000 37.16 2014-01-03 186.64 21.980 36.91 2014-01-06 186.00 22.010 36.13 2014-01-07 189.71 22.310 36.41 2014-01-08 187.97 22.293 35.76
P.S. I am trying to integrate with the BBG library below and in particular, the format in line 9; 'Multi-security accessor'. Maybe someone might find this useful as it would be an institutional-level integration.
Let me address the 1st question with a counter-question:
counts of datesand then that
currencies trade 24x7 whilst stocks only 5 days a week. A quick, a possibly wrong, interpretation is that the data for currencies has intraday resolution whilst that for stock has daily resolution
- Is this so?
If yes and all datas have the same resolution and starting with the
1.9.xreleases, data sets with different lengths are automatically synchronized. During the weekends, the data points delivered for the stocks would be those from Friday, whilst the currencies would keep on ticking.
If no, which resolutions would be in play for which assets?
The 2nd question
- With regards to a table data where the first column is a date & the other 3 columns are equity prices, how do you extract each individual price for each security whilst still referencing the date in the first column to each data?
That seems to be a
pandas.DataFrame. Be it the case a sensible approach would be:
Slice it 3 times. Each time you take the index and columns 0, 1 and 2 with it for each slice
You create 3
bt.feeds.PandasDatafeeds (with the proper field configuration) and simply add them with
Thanks. I upgraded the version & it magically seems to be running now using method 1. My post about currencies vs equities was an exaggeration, in reality I had futures prices that had different numbers of trading days between 2 dates.
The issue I am having now is that it seems to cut out some areas in the indicators such as moving averages which I assume is related to where the potential data gaps/discrepancies are. Is there a way to get the indicators to 'autofill' or backfill during init?
I also seem to be getting an error with pyfolio now running the exact same code as before but with the new updated backtrader. Is this happening for anyone else?
TypeError Traceback (most recent call last) <ipython-input-6-bfb2e7e8f103> in <module>() 2 strat = results 3 pyfoliozer = strat.analyzers.getbyname('pyfolio') ----> 4 returns, positions, transactions, gross_lev = pyfoliozer.get_pf_items(float) TypeError: get_pf_items() takes exactly 1 argument (2 given)
@revelator101 The default
runonce=True, which is faster (tight
dictlookups) but cannot stretch the buffers of things to keep them synchronized, because each indicator is fully calculated, before going to the next.
Unless something is wrong what you see above shouldn't happen with:
run(runonce=False). Here not only the
nextmethod of your strategy is called on a step-by-step basis: the
nextmethod of indicators too.
On the other hand, plotting usually breaks if running with
run(runonce=True)and the buffers are of different length. Your plot is there, so the question is how you are actually executing.
Synchronization was covered last in September. See:
To further test it, the sample data sets used by the sample in that post have suffered larger editions, removing almost two months of trading from each at different points in time.
The effects should be obvious below, because the data source has suddenly a line which is almost straight and connects the last point with the new point at which both data sets re-synchronize.
Notice how the moving average on the lower data source has also the stretch effect and not a gap.
runonce=False(via the provider command line switch
./multidata-strategy-unaligned.py --runnext --plot
@revelator101 with regards to pyfolio, the sample that is in the sources run seamlessly against
A screenshot of today's run: