Backtrader Community

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

    How to speed up backtest

    General Discussion
    3
    34
    22580
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      backtrader administrators last edited by

      After much consideration there isn't a good in-code solution for replaydata because the timestamps have to be checked to re-construct the bars.

      Have a data feed which reads input which is formatted to indicate replaying. X ticks will be read and the data feed will not increase its own length, until a marker is seen and the length is increased. With that in mind, there is no actual replaying being calculated each and every time, because the actual replay sequence is already in the data feed (frozen in disk)

      1 Reply Last reply Reply Quote 0
      • B
        backtrader administrators last edited by

        Some tests have been made with buffers other than the standard distribution array.array. Summary of results

        • Using pandas.Series as a direct replacement is impossible, due to how actually appending data works, the times increase exponentially

        • Using bcolz.carray and the built-in append functionality (touted as much lighter than anything you can do with numpy) is not such a disaster as pandas.Series, but the actual execution times increase several orders of magnitude

        • Reading CSV data with array.array, moving it to a bcolz.carray or pandas.Series buffer and then pre-allocating all indicators/observers buffers with either bcolz.carray or pandas.Series and then vectorizing the indicator operations (for example with pandas.Series.rolling(window=x, center=False).mean()) brings the performance close to that of purely using array.array

          For 2 years of daily data (approximately 2 x 256 = 5632 bars) and 1 single SimpleMovingAverage

          • Standard array.array: 0.7s
          • Approach pandas.Series: 1.4s

        It actually seems that array.array (at least for a small amount of bars) fares better than using pandas.Series, even with a vectorized operation.

        There is one last step which may be still worth trying:

        • Modify the CSV data feed to directly use pandas and avoid the copying of data from array.array to pandas.Series

        A real potential route of optimization (but with a real rework of the entire architecture) could be: dask to have calculations bypass the GIL and use all available cores (the backtrader machinery would create the needed diagram task flow taking calculation dependencies into account) A long route in any case

        1 Reply Last reply Reply Quote 0
        • RandyT
          RandyT last edited by

          @backtrader Very interesting. Out of curiosity, were your tests run with Python3?

          1 Reply Last reply Reply Quote 0
          • B
            backtrader administrators last edited by

            2 and 3 ... and 3 had the worst overall results with a couple of combinations being really incredibly slow.

            1 Reply Last reply Reply Quote 0
            • B
              backtrader administrators last edited by backtrader

              For the sake of completeness and to make sure that 5000 and something bars is not really too small ... a test with a complete 1 year set made up of 1-minute bars.

              Summarizing what the test does:

              • Preload the data
              • Create a single SMA of period=20
              • Do nothing in the strategy's next
              • 214911 minutes of data

              Test using pd.Series as the holding buffer and with the vectorized version of the SMA

              Overall Start : 2017-02-18 08:22:18.043000
              Cerebro Create: 2017-02-18 08:22:18.045000
              Cerebro Start : 2017-02-18 08:22:18.046000
              Strategy Start: 2017-02-18 08:22:25.986000
              Strategy End  : 2017-02-18 08:23:25.616000
              Strategy Time : 0:00:59.630000
              Cerebro End   : 2017-02-18 08:23:25.641000
              Cerebro Time  : 0:01:07.595000
              

              Test using the standard array.array in the python distribution

              Overall Start : 2017-02-18 08:24:16.984000
              Cerebro Create: 2017-02-18 08:24:16.986000
              Cerebro Start : 2017-02-18 08:24:16.987000
              Strategy Start: 2017-02-18 08:24:24.609000
              Strategy End  : 2017-02-18 08:24:46.384000
              Strategy Time : 0:00:21.775000
              Cerebro End   : 2017-02-18 08:24:46.385000
              Cerebro Time  : 0:00:29.398000
              

              Quick Summary:

              • Data loading time is roughly below 8 seconds
              • The pd.Series vectorized approach takes over 100% more time than the standard array.array version
              1 Reply Last reply Reply Quote 0
              • B
                backtrader administrators last edited by

                As mentioned above, the data was still loaded using array.array and then moved into pandas.Series. The final test comprises:

                • Use of pandas.read_csv
                • Translation of pandas datetime to the timestamp used in backtrader which is the same as in matplotlib

                The result when loading the 214911 bars:

                • Loading time has been reduced to 3.5 seconds from the previous roughly under 8seconds
                • Backtesting time remains the same with the pd.Series version still well over 1 minute

                The branch has been uploaded to the repository ... (branch - numpylines) (directly using numpy arrays was the 1st approach, which then moved to dynamic buffers with bcolz, pandas ...), should anyone have any interest at all.

                The data loader which directly uses pd.DataFrame and pd.Series is called BacktraderCSVData2

                From this there is anyhow a positive point in that it may be possible to avoid copying some data when binding lines by just replacing object references in some cases. It will be attempted in the development branch.

                E 2 Replies Last reply Reply Quote 1
                • B
                  backtrader administrators @Ed Bartosh last edited by

                  @Ed-Bartosh said in How to speed up backtest:

                  @backtrader ok, create a new thread for this. I'll be happy to explain. Here it would be off-topic I guess.

                  Thread created here: https://community.backtrader.com/post/923

                  1 Reply Last reply Reply Quote 0
                  • E
                    Ed Bartosh @backtrader last edited by

                    @backtrader This is very interesting indeed! I'll definitely give this loader a try. Thank you very much!

                    1 Reply Last reply Reply Quote 0
                    • B
                      backtrader administrators last edited by

                      The only indicator on that branch which has been changed is the SimpleMovingAverage

                      E 1 Reply Last reply Reply Quote 0
                      • E
                        Ed Bartosh @backtrader last edited by

                        @backtrader said in How to speed up backtest:

                        The only indicator on that branch which has been changed is the SimpleMovingAverage

                        Thanks for pointing this out. Fortunately this is not a problem for me at all as I'm not using backtrader indicators.

                        1 Reply Last reply Reply Quote 0
                        • E
                          Ed Bartosh @backtrader last edited by

                          @backtrader played with numpylines branch. Didn't see noticeable improvements. As you've mentioned empty backtest still runs more than one minute :( Any other ideas how to speed it up?

                          1 Reply Last reply Reply Quote 0
                          • B
                            backtrader administrators last edited by

                            There are for sure some areas where small optimizations may be possible. For example:

                            • datetime conversions

                              It would sensible to think that if each data feed makes an initial conversion of the current timestamp to a python datetime.datetime instance, the rest of the platform may use this, rather than performing local conversions.

                            But there is for sure an area which was designed with ease of use in mind, to create a clear boundary between the past the present and the future and to offer a fixed reference point in time:

                            • 0 indexing

                            References to 0 are translated back the current position in the array and the access to [0] are ubiquitous in the core and this is for sure something that has an impact in performance.

                            E 1 Reply Last reply Reply Quote 0
                            • E
                              Ed Bartosh @backtrader last edited by

                              @backtrader thank you for the suggestions! I'll try to look at this.

                              My algo that operates 3 securities with 2 time frames(minute and daily) runs two times faster after optimisations! Most of performance gains came from switching to PandasDirectData feed, truncating the data before loading into backtrader, from getting rid of replaydata and disabling datetime conversion when calling pd.read_csv. Thank you very much for your suggestions again!

                              I found it quite generic and useful to truncate the data to the fromdate-todate range. My csv files contain data for 10 years. When I run backtest for only one year of data I simply truncate the rest of data before loading into backtrader. This speeds up loading data quite a lot.

                              I'm going to try numpylines branch again as I suspect I did something wrong that it didn't show any performance improvements. Can you suggest how to modify PandasDirectData loader to utilize your changes?
                              Here is an example of my data:

                                                             2      3      4      5      6
                              2014-11-17 09:31:00-05:00  96.06  96.20  96.05  96.05    955
                              2014-11-17 09:32:00-05:00  96.13  96.32  96.10  96.32   1341
                              2014-11-17 09:33:00-05:00  96.32  96.45  96.29  96.37    522
                              2014-11-17 09:34:00-05:00  96.30  96.33  96.20  96.20    208
                              2014-11-17 09:35:00-05:00  96.21  96.27  96.12  96.12    941
                              2014-11-17 09:36:00-05:00  96.11  96.36  96.05  96.23   2350
                              ...
                              

                              It's indexed by datetime as you can see. Is it mandatory to index the data, btw. Does it make backtrader faster to load it?

                              1 Reply Last reply Reply Quote 0
                              • B
                                backtrader administrators last edited by backtrader

                                Would be something like this

                                class PandasDirectData_NumPyLines(feed.DataBase):
                                    params = (
                                        ('datetime', 0),
                                        ('open', 1),
                                        ('high', 2),
                                        ('low', 3),
                                        ('close', 4),
                                        ('volume', 5),
                                        ('openinterest', 6),
                                    )
                                
                                    datafields = [
                                        'datetime', 'open', 'high', 'low', 'close', 'volume', 'openinterest'
                                    ]
                                
                                    def start(self):
                                        super(PandasDirectData_NumPyLines, self).start()
                                        self._df = self.p.dataname
                                
                                    def preload(self):
                                        # Set the standard datafields - except for datetime
                                        for datafield in self.datafields[1:]:
                                            # get the column index
                                            colidx = getattr(self.params, datafield)
                                
                                            if colidx < 0:
                                                # column not present -- skip
                                                continue
                                
                                            l = getattr(self.lines, datafield)
                                            l.array = self._df.iloc[:, colidx]
                                
                                        field0 = self.datafields[0]
                                        dts = pd.to_datetime(self.index)
                                        getattr(self.l, field0).array = dts.apply(date2num)
                                
                                        self._last()
                                        self.home()
                                
                                

                                Where datetime is directly taken from the index. The default column offset for the other fields is probably 1-off because of it, but it can luckily be configured

                                1 Reply Last reply Reply Quote 0
                                • 1
                                • 2
                                • 2 / 2
                                • First post
                                  Last post
                                Copyright © 2016, 2017, 2018, 2019, 2020, 2021 NodeBB Forums | Contributors