Downselect from a large universe, trade ~6 tickers
-
Hello, I've been working with backtrader for a couple of weeks now (great work by the way!), and I have some difficulties that I haven't been able to figure out on my own so far.
I am attempting to use backtrader to filter and select a few securities to trade from a large universe of about 200 etfs. Ideally, this code would be able to handle an arbitrarily large universe. That would be easy enough to do externally if I wanted to just stick with the original selections, but I want it to be able to rebalance at some given frequency (monthly perhaps) where it would go back and check the universe again for better candidates.
So far as I've found, there is not a convenient way to remove datastreams once they've been added and the strategy has been initialized. So, the only way that I have been able to do this is to pass the entire universe into self.datas via cerebro, and then let the strategy do all the magic. This seems like a really inefficient way to do it to me since I'm only using about 6 of those data streams at any given time. I've probably missed a glaringly simple way to do this. I am really horrible at programming to begin with, but I did try to figure this out and googled until my eyes bled.
If you're interested, this is how I am building self.datas:
data = {} for sector in updDict.values(): for i, ticker in enumerate(sector): data[i] = bt.feeds.YahooFinanceCSVData(dataname='./data/'+ticker+ '.csv', reverse = True, nullvalue=0, plot = False, **kwargs) cerebro.adddata(data[i], name=ticker)
Right now updDict is a dictionary with 6 sectors as the keys and then a list of etfs for each key's value. In the strategy, I rank the etfs for each sector and pick the top one to trade, leaving me with 6 etfs.
Aside from the obvious inefficiency of having so many data feeds, plotting becomes impossible because cerebro wants to plot every data feed. I'm not really sure plotting a set of potentially rotating data streams would be any easier, though.
This turned into a bit of a book, so I will leave it at that for now. I would appreciate any suggestions.
-
No one? I'm really kind of stuck on this.
-
@Ninjaneer said in Downselect from a large universe, trade ~6 tickers:
So far as I've found, there is not a convenient way to remove datastreams once they've been added and the strategy has been initialized.
You want to work with 6 stream from a 200 universe and at some point in time remove some or all of the 6 and replace the removed ones with others from the 194 (200 - 6) remaining streams.
The point here:
- There was some initial criteria to select the 6
- There is some criteria to decide which to remove
- There is some criteria to decide which will replace the removed
What and how calculates the criteria constantly for the 200 streams?
Because the standard use case is to use something indicator-like, but if 194 streams are not added into the system, what calculates the indicators?
Furthermore:
- When you replace some streams, these have to be automatically synchronized in datetime terms with the existing ones.
If the 200 streams are in the platform, they are constantly moved forward and kept synchronized. Otherwise, they will have to be loaded and reloaded each time when inserted/reinserted in the platform to find the right synchronization point.
With all this in mind, adding the 200 streams seems the way to go (there is no other actually). Something which can improve your management:
- Keep the created data feeds in a dictionary with the same structure as the one your have for your sectors.
- Choosing the data feed you want to replace/insert should then be easier according to your logic
@Ninjaneer said in Downselect from a large universe, trade ~6 tickers:
plotting becomes impossible because cerebro wants to plot every data feed. I'm not really sure plotting a set of potentially rotating data streams would be any easier, though.
Plotting rotating data feeds is for sure not something which can be supported. For starters you need a scale and the 200 streams will mostly have different ranges and orders of magnitude.
What could you plot to keep track of changes? You could create an indicator which has received for example a list from your strategy. This list will always contain the 6 streams (or the indices or names for them) you have active. In the strategy you quickly create in a loop a percentage change indicator of the 200 streams. This is also passed to the indicator, which then always keep track of the percentage change of the 6 active streams.
And percentages at least have all the same range (0-100%)
But in summary: managing a universe of 200 streams seems to require that you add them to the system along with indicators on the streams to choose the rebalancing.
-
@backtrader said in Downselect from a large universe, trade ~6 tickers:
What and how calculates the criteria constantly for the 200 streams?
Because the standard use case is to use something indicator-like, but if 194 streams are not added into the system, what calculates the indicators?
Loading all of the streams into datas is okay for relatively small timeframes or a small universe. Let's say you built a 2000 stream universe and wanted to test it over 20 years. That could be pretty resource intensive. My thought was to only hold in memory the streams and timeframes necessary for the calculations of indicators and trades within some reasonable bracket of the current date.
This was the process that I envisioned:
-
Have a module that is fed the current datetime, loads the necessary data for all streams including 'today - minperiod' or similar so that it can calculate the indicators necessary for the ranking.
-
Call said module at nextstart to build the initial shortlist.
-
Call module again if a position is closed manually or automatically to backfill the position with a new stream. An additional call to the module could be scheduled at a given interval for rebalance, in which case, you'd also submit an order to sell all current positions, etc.
-
Consider an option to write any indicator/trade/analyzer data to a file instead of retaining it in memory until plot time. The file part is easy enough, but I have no idea how easy it would be to dump the 'out of window' memory.
The point of all this being that you only hold in memory what's necessary for the immediate functioning of the program at any given time. All the extra loading/unloading/file writing may negate any performance benefits of holding less in memory, though. Like I said in my first post, I'm a fairly poor programmer, and I am wading into 'resource management' waters that are far over my head when I can barely swim with floaties. So, if all of this is a bit asinine, I apologize.
- Keep the created data feeds in a dictionary with the same structure as the one your have for your sectors.
- Choosing the data feed you want to replace/insert should then be easier according to your logic
This is exactly how I have it set up now. My 'universe' is in a dictionary of lists, and my 'shortlist' is a dictionary with the same keys and only one string per key. The 'shortlist' is used to define another ' datasSL' dict that contains the self.datas index of each ticker as the value. datasSL is what is actually fed to the part of the startegy that makes the trade decisions. It works well so far.
What could you plot to keep track of changes? You could create an indicator which has received for example a list from your strategy. This list will always contain the 6 streams (or the indices or names for them) you have active. In the strategy you quickly create in a loop a percentage change indicator of the 200 streams. This is also passed to the indicator, which then always keep track of the percentage change of the 6 active streams.
And percentages at least have all the same range (0-100%)
This is a great thought. I'll definitely give it a try.
Let's say for diagnostic purposes, I wanted to plot a certain feed, but I have obviously turned the plotting off for that feed when I entered it into cerebro. Is there a way to plot in manually in a def stop: or similar? -
-
@Ninjaneer said in Downselect from a large universe, trade ~6 tickers:
My thought was to only hold in memory the streams and timeframes necessary for the calculations of indicators and trades within some reasonable bracket of the current date.
In that case try
exactbars=1
in cerebro. This is exactly aimed at telling each and every object to keep the minimum possible buffer during the entire operation. See@Ninjaneer said in Downselect from a large universe, trade ~6 tickers:
Let's say for diagnostic purposes, I wanted to plot a certain feed, but I have obviously turned the plotting off for that feed when I entered it into cerebro. Is there a way to plot in manually in a def stop: or similar?
Until you invoke
cerebro.plot
any of the plotting attributes of data feeds and indicators (plotinfo.plot
or theplotlines
attributes) can be changed from anyhwere. -
I guess there's a lot to be said for thoroughly reading the manual :)