Multi-asset Datafeed Standardization
-
Hi, I'm using bloomberg api to pull in data & struggling to integrate multi-asset datafeeds;
-
I am pulling several datas individually through an API. They have different counts of dates & this is causing the backtest to crash (i.e. currencies trade 24/7 whilst stocks only 5 days a week). I know zipline automatically solves this problem by backfilling data per day (i.e. if you pass a monthly & daily data, it will auto-reconfigure the monthly into a daily sample for all days available). I have looked at the documentation regarding the 'filler' which is quite easy to do in a pandas dataframe, but how would you do this for multiple datas? (i.e. 1st with 200 days, 2nd with 190 days, 3rd with 210 days).
-
With regards to a table data where the first column is a date & the other 3 columns are equity prices, how do you extract each individual price for each security whilst still referencing the date in the first column to each data?
IBM US EQUITY CSCO US EQUITY MSFT US EQUITY date 2014-01-02 185.53 22.000 37.16 2014-01-03 186.64 21.980 36.91 2014-01-06 186.00 22.010 36.13 2014-01-07 189.71 22.310 36.41 2014-01-08 187.97 22.293 35.76
Regards
-
-
P.S. I am trying to integrate with the BBG library below and in particular, the format in line 9; 'Multi-security accessor'. Maybe someone might find this useful as it would be an institutional-level integration.
https://github.com/bpsmith/tia/blob/master/examples/datamgr.ipynb
-
Let me address the 1st question with a counter-question:
- I am pulling several datas individually through an API. They have different counts of dates & this is causing the backtest to crash (i.e. currencies trade 24/7 whilst stocks only 5 days a week). I know zipline automatically solves this problem by backfilling data per day (i.e. if you pass a monthly & daily data, it will auto-reconfigure the monthly into a daily sample for all days available). I have looked at the documentation regarding the 'filler' which is quite easy to do in a pandas dataframe, but how would you do this for multiple datas? (i.e. 1st with 200 days, 2nd with 190 days, 3rd with 210 days).
You say
counts of dates
and then thatcurrencies trade 24x7 whilst stocks only 5 days a week
. A quick, a possibly wrong, interpretation is that the data for currencies has intraday resolution whilst that for stock has daily resolution- Is this so?
If yes and all datas have the same resolution and starting with the
1.9.x
releases, data sets with different lengths are automatically synchronized. During the weekends, the data points delivered for the stocks would be those from Friday, whilst the currencies would keep on ticking.If no, which resolutions would be in play for which assets?
-
The 2nd question
- With regards to a table data where the first column is a date & the other 3 columns are equity prices, how do you extract each individual price for each security whilst still referencing the date in the first column to each data?
That seems to be a
pandas.DataFrame
. Be it the case a sensible approach would be:-
Slice it 3 times. Each time you take the index and columns 0, 1 and 2 with it for each slice
-
You create 3
bt.feeds.PandasData
feeds (with the proper field configuration) and simply add them withcerebro.adddata
-
Thanks. I upgraded the version & it magically seems to be running now using method 1. My post about currencies vs equities was an exaggeration, in reality I had futures prices that had different numbers of trading days between 2 dates.
The issue I am having now is that it seems to cut out some areas in the indicators such as moving averages which I assume is related to where the potential data gaps/discrepancies are. Is there a way to get the indicators to 'autofill' or backfill during init?
-
I also seem to be getting an error with pyfolio now running the exact same code as before but with the new updated backtrader. Is this happening for anyone else?
TypeError Traceback (most recent call last) <ipython-input-6-bfb2e7e8f103> in <module>() 2 strat = results[0] 3 pyfoliozer = strat.analyzers.getbyname('pyfolio') ----> 4 returns, positions, transactions, gross_lev = pyfoliozer.get_pf_items(float) TypeError: get_pf_items() takes exactly 1 argument (2 given)
-
@revelator101 The default
run
mode usesrunonce=True
, which is faster (tightfor
loops withoutdict
lookups) but cannot stretch the buffers of things to keep them synchronized, because each indicator is fully calculated, before going to the next.Unless something is wrong what you see above shouldn't happen with:
run(runonce=False)
. Here not only thenext
method of your strategy is called on a step-by-step basis: thenext
method of indicators too.On the other hand, plotting usually breaks if running with
run(runonce=True)
and the buffers are of different length. Your plot is there, so the question is how you are actually executing.Synchronization was covered last in September. See:
To further test it, the sample data sets used by the sample in that post have suffered larger editions, removing almost two months of trading from each at different points in time.
The effects should be obvious below, because the data source has suddenly a line which is almost straight and connects the last point with the new point at which both data sets re-synchronize.
Notice how the moving average on the lower data source has also the stretch effect and not a gap.
Run with
runonce=False
(via the provider command line switch--runnext
):./multidata-strategy-unaligned.py --runnext --plot
The chart
-
@revelator101 with regards to pyfolio, the sample that is in the sources run seamlessly against
pyfolio
A screenshot of today's run:
-
This post is deleted!