How to speed up almost 100 times when add data and preload data?
-
If you don't use too many data, you may ignore this page.
As we know,when we use preload data,our backtest will faster than not using it,but however,preload data will consume much time ,so ,maybe ,there is a way to speed up the preload function.
when I load 5000+ future contract,every time I preload data,it consume me 62.5 seconds. Terrible!
But , if we save self.datas, after it runs in cerebro to pickle and read it from pickle,it just consume 0.66 seconds.
first time,you should run it and save my.datas :
cerebro.run(save_my_data=True)
after this time,you can use it to speed up:
cerebro.run(load_my_data=True)
how to implement it?
you should add a function load_my_data_from_pickle to cerebro and modify the function runstrategies and add params.
- add two param
params = ( ('preload', True), ('runonce', True), ('maxcpus', None), ('stdstats', True), ('oldbuysell', False), ('oldtrades', False), ('lookahead', 0), ('exactbars', False), ('optdatas', True), ('optreturn', True), ('objcache', False), ('live', False), ('writer', False), ('tradehistory', False), ('oldsync', False), ('tz', None), ('cheat_on_open', False), ('broker_coo', True), ('quicknotify', False), ("load_my_data",False), ("save_my_data",False) )
- add function and modify function
def load_my_data_from_pickle(self,path="normal_future_data.pkl"): ''' add from pickle''' import pickle with open(path,"rb") as f: my_data = pickle.load(f) return my_data def runstrategies(self, iterstrat, predata=False): ''' Internal method invoked by ``run``` to run a set of strategies ''' self._init_stcount() self.runningstrats = runstrats = list() for store in self.stores: store.start() if self.p.cheat_on_open and self.p.broker_coo: # try to activate in broker if hasattr(self._broker, 'set_coo'): self._broker.set_coo(True) if self._fhistory is not None: self._broker.set_fund_history(self._fhistory) for orders, onotify in self._ohistory: self._broker.add_order_history(orders, onotify) self._broker.start() for feed in self.feeds: feed.start() if self.writers_csv: wheaders = list() for data in self.datas: if data.csv: wheaders.extend(data.getwriterheaders()) for writer in self.runwriters: if writer.p.csv: writer.addheaders(wheaders) # self._plotfillers = [list() for d in self.datas] # self._plotfillers2 = [list() for d in self.datas] if not predata: if self.p.load_my_data: # begin_time=time.time() self.datas = self.load_my_data_from_pickle() # end_time=time.time() # print("every time pre_load consume time :{}".format(end_time-begin_time)) # assert 0 elif self.p.save_my_data: begin_time=time.time() for data in self.datas: data.reset() if self._exactbars < 1: # datas can be full length data.extend(size=self.params.lookahead) data._start() if self._dopreload: data.preload() end_time=time.time() print("every time pre_load consume time :{}".format(end_time-begin_time)) import pickle with open("normal_future_data.pkl",'wb') as f: pickle.dump(self.datas,f) assert 0 else: begin_time=time.time() for data in self.datas: data.reset() if self._exactbars < 1: # datas can be full length data.extend(size=self.params.lookahead) data._start() if self._dopreload: data.preload() end_time=time.time() print("every time pre_load consume time :{}".format(end_time-begin_time)) for stratcls, sargs, skwargs in iterstrat: sargs = self.datas + list(sargs) try: strat = stratcls(*sargs, **skwargs) except bt.errors.StrategySkipError: continue # do not add strategy to the mix if self.p.oldsync: strat._oldsync = True # tell strategy to use old clock update if self.p.tradehistory: strat.set_tradehistory() runstrats.append(strat) tz = self.p.tz if isinstance(tz, integer_types): tz = self.datas[tz]._tz else: tz = tzparse(tz) if runstrats: # loop separated for clarity defaultsizer = self.sizers.get(None, (None, None, None)) for idx, strat in enumerate(runstrats): if self.p.stdstats: strat._addobserver(False, observers.Broker) if self.p.oldbuysell: strat._addobserver(True, observers.BuySell) else: strat._addobserver(True, observers.BuySell, barplot=True) if self.p.oldtrades or len(self.datas) == 1: strat._addobserver(False, observers.Trades) else: strat._addobserver(False, observers.DataTrades) for multi, obscls, obsargs, obskwargs in self.observers: strat._addobserver(multi, obscls, *obsargs, **obskwargs) for indcls, indargs, indkwargs in self.indicators: strat._addindicator(indcls, *indargs, **indkwargs) for ancls, anargs, ankwargs in self.analyzers: strat._addanalyzer(ancls, *anargs, **ankwargs) sizer, sargs, skwargs = self.sizers.get(idx, defaultsizer) if sizer is not None: strat._addsizer(sizer, *sargs, **skwargs) strat._settz(tz) strat._start() for writer in self.runwriters: if writer.p.csv: writer.addheaders(strat.getwriterheaders()) if not predata: for strat in runstrats: strat.qbuffer(self._exactbars, replaying=self._doreplay) for writer in self.runwriters: writer.start() # Prepare timers self._timers = [] self._timerscheat = [] for timer in self._pretimers: # preprocess tzdata if needed timer.start(self.datas[0]) if timer.params.cheat: self._timerscheat.append(timer) else: self._timers.append(timer) if self._dopreload and self._dorunonce: if self.p.oldsync: self._runonce_old(runstrats) else: self._runonce(runstrats) else: if self.p.oldsync: self._runnext_old(runstrats) else: self._runnext(runstrats) for strat in runstrats: strat._stop() self._broker.stop() if not predata: for data in self.datas: data.stop() for feed in self.feeds: feed.stop() for store in self.stores: store.stop() self.stop_writers(runstrats) if self._dooptimize and self.p.optreturn: # Results can be optimized results = list() for strat in runstrats: for a in strat.analyzers: a.strategy = None a._parent = None for attrname in dir(a): if attrname.startswith('data'): setattr(a, attrname, None) oreturn = OptReturn(strat.params, analyzers=strat.analyzers, strategycls=type(strat)) results.append(oreturn) return results return runstrats
very good job!!!
-
@tianjixuetu Thanks for sharing. It helped me save over 140sec per run.
-
@vladisld @ab_trader Can we add this feature to the base code.
-
Can you help me with this error?
Get the same error if I usepickle.dump(self.datas , f_point)
-
@tianjixuetu
when i use get databyname() to order , this error happened , can you help me ? thanks! -
@andy maybe you change strategy ,so you cannot get she data by name, I also get this problem, maybe we can just use it in the same strategy name.
-
@Ibrahim-Chippa I don't know.it seems the pickle can not work
-
hi,I have not change the strategy. I have solved the problem by add these codes,thanks again!
with open("data.pkl","rb") as f: self.datas = pickle.load(f) for data in self.datas: self.datasbyname[data._name] = data
-
I'm utterly confused by this thread, sorry. If you have your datasets stashed away somewhere (loaded via broker API then saved as CSV, Pickle etc), why not do this --
for symbol in symbols: df = broker_api.cached_price_history_1day(symbol) data = bt.feeds.PandasData(dataname=df, plot=False, **dkwargs) cerebro.adddata(data, name=symbol)
(...where my broker_api.cached_price_history_1day() is smart enouht to load disk-cached data if no update is needed) ?
-
You can use vpn in such cases. I also often use sich tools during web surfing for protecting own plivacy. My internet speed is fast and I do not have glicthes during using https://veepn.com/vpn-features/no-log-vpn/
-
Hi,
This is extremely helpful. The issue I am having is that the pickle file saves only when my data classes are derived from bt.feeds.PandasData, but when the classes are derived from bt.feeds.PandasDirectData (to speed up first time loading) I get the following error. Any insights?
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup Pandas on pandas.core.frame failed
Thanks,
AP
-
When I using optstrategy, It turns out errors.
-
@the-world I solved this error by modifying the code using the pickle in the cerebro.py
if self.p.optdatas and self._dopreload and self._dorunonce: if self.p.load_my_data: begin_time = time.time() self.datas = self.load_my_data_from_pickle() end_time = time.time() print("every time pre_load from pkl consume time :{}".format(end_time - begin_time)) else: begin_time = time.time() for data in self.datas: data.reset() if self._exactbars < 1: # datas can be full length data.extend(size=self.params.lookahead) data._start() if self._dopreload: data.preload() end_time = time.time() print("every time pre_load from raw consume time :{}".format(end_time-begin_time))
-
@dehati_paul said in How to speed up almost 100 times when add data and preload data?:
es only when my data classes are derived from bt.feeds.PandasData, but when the classes are derived from bt.feeds.PandasDirectData (to speed up first time loading) I get the following error. Any insights?
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup PaI got the same error. Have you fixed it?
-
@tianjixuetu I quit this way to speed up. no matter can this way validate . I will find a new way to speed the backtest speed.maybe use numpylines,in the python,when deal with huge number,numpy maybe a good choice.
-
@tianjixuetu did you find any better way to speed up the back tests?
-
@tianjixuetu Hi, can we have a little discussion about how to speed the backtest? Recently I am trying to find a new way to solve it. My Wechat Id is 13247198760