Backtrader with a lot of datafeed - trick to reduce data loading time
-
Dear All,
I'm currently backtesting a strategy that needs a lot of datafeeds. If I understand correctly how backtrader works, one of the first step cerebro will perform (when doing cerebro.run()) is to pre-load the data (assuming preload = True which is the default setting). This step takes quite a bit of time in my case (maybe 3/4 minutes) but running through the bars afterwards is relatively fast (as fast as python can be:)); is there a way I could cache/store things so that i don't need to wait that time (3/4 minutes) everytime i run a backtest? I was thinking about pickling the cerebro object (i know cerebro can be pickled) but i do not know which cerebro's method to launch to load the data without launching cerebro.run(). or would you any other suggestion to shorten this annoying waiting time?
thanks and regards
Lamp' -
Probably not directly related to multiple data feeds speedup - but the following post discussed some ways of speeding up the feed's loading times (including some caching):
Thanks
Vlad -
thanks, I'll take a look. Profiling the backtest is any case probably a good start
-
Pickling cerebro won't help.
The way to achieve what you want is to develop your own data feed which would use pre-loaded data already residing in RAM.
-
@vladisld thank you again.
@lampalork I also meet this question. I load more than 5000 future contracts,and backtest a strategy,it consumes nearly 12 minutes, this time is a bit longer.I also want to find some way to speed up, we guess,it maybe the pre_loaded data consume most of the time.so, we should develop something to speed up as @backtrader said?@ as you said,our own data feed, is it like this code you write before?
class PandasDirectData_NumPyLines(feed.DataBase): params = ( ('datetime', 0), ('open', 1), ('high', 2), ('low', 3), ('close', 4), ('volume', 5), ('openinterest', 6), ) datafields = [ 'datetime', 'open', 'high', 'low', 'close', 'volume', 'openinterest' ] def start(self): super(PandasDirectData_NumPyLines, self).start() self._df = self.p.dataname def preload(self): # Set the standard datafields - except for datetime for datafield in self.datafields[1:]: # get the column index colidx = getattr(self.params, datafield) if colidx < 0: # column not present -- skip continue l = getattr(self.lines, datafield) l.array = self._df.iloc[:, colidx] field0 = self.datafields[0] dts = pd.to_datetime(self.index) getattr(self.l, field0).array = dts.apply(date2num) self._last() self.home()
This is a strategy which I run optimizition!
If we can speed up the strategy,we may give the user better experience!
-
@tianjixuetu said in Backtrader with a lot of datafeed - trick to reduce data loading time:
t maybe the pre_loaded data consume most of the time.
Don't preload the data, run it again and compare the times.
-
@backtrader It is too strange!!! when I use preload,it consume little time than I don't use
my main code:### some code has nothing with this test cerebro.broker.setcash(1000000.0) cerebro.run(preload=False) end_time=time.time() print("preload=False total use time is : {} ".format(end_time-begin_time))
what happend???
-
The obvious happpened.
preload=True
is saving you time, even if you think you can develop some magical method to speed things up.Your optimization is already using the preloaded data in all processes.
This thread is about keeping the data in memory across backtesting runs, for which a data feed which sources from RAM across instantiations would be needed. Which means you have to keep a 2nd process running which keeps things in RAM which has to give you a key to access that RAM.
-
@lampalork
Which data feed are you using? -
@backtrader @lampalork @vladisld maybe,this is a way to speed up! How to speed up almost 100 times when add data and preload data?