Repeated days in multi symbol strategy
I implemented a multi-symbol mean reversion strategy and the results were much different than the strategy with only one symbol. I was suspicious about the results and found that the system was repeating days in the simulation.
The multi-symbol strategy processes multiple feeds, in the format:
def next(self): … for data in self.datas: # Operations
While reviewing the CSV files of the feeds, I discovered two issues:
- Some days were repeated
- Some days were absent in some feeds (for example: in 100 stocks, there were some days when 1 stock had no trading)
I corrected the feeds with repeated days, but cannot do the same for the problem of absent data.
This lack of synchronization in the dates of the feeds can make a lot of difference in the result, because Backtrader needs to be synchronized for every day in the multi symbol simulation.
I did not understand it well how Backtrader "repeats" days if the feeds do not have da same days for all feeds, but it seems like that in cerebro.py ‘datas’ is synched early in the "_runonce (self, runstrats ) ":
while True: # Check next incoming date in the datas dts = [d.advance_peek() for d in datas] dt0 = min(dts)
See that ‘dt0’ receives the "minimum feed". But it looks like this causes a day that has already been processed to be processed again if any item in 'datas' faild to 'advance'.
I tried putting a test at the beginning of the strategy to check if the days are repeated:
def __init__(self): ... self.last_day = None self.current_day = None def next(self): self.last_day = self.current_day self.current_day = self.data.datetime.datetime(0) if self.current_day == self.last_day: print("Repeated day" + str(self.current_day)) return
But that does not solve the problem.
Please repeat: "backtrader doesn't repeat days". Repeat again: "backtrader doesn't repeat days"
You haven't thought this through
- You have multiple data feeds and they don't deliver data with the same timestamps
- Which means the data are not synchronized and move differently
Consider this scenario and try to answer the question (a lot more useful, imho, than looking into the source code)
- Data feeds
d2give you data for time
d1delivers then data for time
t + 1
d2delivers then data for time
t + 2
It should be obvious that
- data for time
tcan be presented simultaneously for
- data for time
t + 1can be updated for
- What is your expectation at time
t + 1for data
d2? Because as presented above,
d2will first have new data at time
t + 2
- Docs - Platform Concepts and consider the section "Lines -
After answering the question and reading the docs, the following will probably be obvious:
- Create an array during
0for each of the data feeds: for example
self.dl =  * len(self.datas)
- Create a temporary equivalent on entry to
nextwith actual lens:
dl = [len(d) for d in self.datas]
- Check which ones have actually changed:
dlnew = [x - y for x, y in zip(dl, self.dl)]
- Get the indices to it:
dlidx = [i for i, x in enumerate(dlnew) if x]
- Keep actual lenghts:
self.dl = dl
dlidxcontains the indices to the data feeds which have changed. since the last call to
Many thanks for the explanation! It is now clear!
I'm realizing a multi symbol strategy can create a complex synchronization scenario. It is not enough to just copy a “single symbol” strategy, you need to pay attention to the many details and situations that can occur.
Thank you very much!