For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Backtrader with a lot of datafeed - trick to reduce data loading time



  • Dear All,

    I'm currently backtesting a strategy that needs a lot of datafeeds. If I understand correctly how backtrader works, one of the first step cerebro will perform (when doing cerebro.run()) is to pre-load the data (assuming preload = True which is the default setting). This step takes quite a bit of time in my case (maybe 3/4 minutes) but running through the bars afterwards is relatively fast (as fast as python can be:)); is there a way I could cache/store things so that i don't need to wait that time (3/4 minutes) everytime i run a backtest? I was thinking about pickling the cerebro object (i know cerebro can be pickled) but i do not know which cerebro's method to launch to load the data without launching cerebro.run(). or would you any other suggestion to shorten this annoying waiting time?

    thanks and regards
    Lamp'



  • Probably not directly related to multiple data feeds speedup - but the following post discussed some ways of speeding up the feed's loading times (including some caching):

    How to speed up backtest

    Thanks
    Vlad



  • thanks, I'll take a look. Profiling the backtest is any case probably a good start


  • administrators

    Pickling cerebro won't help.

    The way to achieve what you want is to develop your own data feed which would use pre-loaded data already residing in RAM.



  • @vladisld thank you again.
    @lampalork I also meet this question. I load more than 5000 future contracts,and backtest a strategy,it consumes nearly 12 minutes, this time is a bit longer.I also want to find some way to speed up, we guess,it maybe the pre_loaded data consume most of the time.so, we should develop something to speed up as @backtrader said?

    @ as you said,our own data feed, is it like this code you write before?

    class PandasDirectData_NumPyLines(feed.DataBase):
        params = (
            ('datetime', 0),
            ('open', 1),
            ('high', 2),
            ('low', 3),
            ('close', 4),
            ('volume', 5),
            ('openinterest', 6),
        )
    
        datafields = [
            'datetime', 'open', 'high', 'low', 'close', 'volume', 'openinterest'
        ]
    
        def start(self):
            super(PandasDirectData_NumPyLines, self).start()
            self._df = self.p.dataname
    
        def preload(self):
            # Set the standard datafields - except for datetime
            for datafield in self.datafields[1:]:
                # get the column index
                colidx = getattr(self.params, datafield)
    
                if colidx < 0:
                    # column not present -- skip
                    continue
    
                l = getattr(self.lines, datafield)
                l.array = self._df.iloc[:, colidx]
    
            field0 = self.datafields[0]
            dts = pd.to_datetime(self.index)
            getattr(self.l, field0).array = dts.apply(date2num)
    
            self._last()
            self.home()
    

    This is a strategy which I run optimizition!

    backtest_too_long_time.png

    If we can speed up the strategy,we may give the user better experience!


  • administrators

    @tianjixuetu said in Backtrader with a lot of datafeed - trick to reduce data loading time:

    t maybe the pre_loaded data consume most of the time.

    Don't preload the data, run it again and compare the times.



  • @backtrader It is too strange!!! when I use preload,it consume little time than I don't use
    load_data.png
    preload=True.png
    preload=False.png
    my main code:

    ### some code has nothing with this test
    cerebro.broker.setcash(1000000.0)
    cerebro.run(preload=False)
    end_time=time.time()
    print("preload=False total use time is : {} ".format(end_time-begin_time))
    

    what happend???


  • administrators

    The obvious happpened. preload=True is saving you time, even if you think you can develop some magical method to speed things up.

    Your optimization is already using the preloaded data in all processes.

    This thread is about keeping the data in memory across backtesting runs, for which a data feed which sources from RAM across instantiations would be needed. Which means you have to keep a 2nd process running which keeps things in RAM which has to give you a key to access that RAM.



  • @lampalork
    Which data feed are you using?




Log in to reply
 

});