For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Maximum number of data lines?



  • Is there a maximum number of data lines (different equities to trade simultaneously) that Backtrader can reasonably handle?

    Right now I am using hourly data for 215 stocks for 4 years. The completion of the backtest takes several hours. This is all right but I wonder how much data could be added. Would it still work with 500 stocks? 1000 stocks? 2000?

    Is there a formula to predict the performance basing on the amount of data?

    I reckon that the answer also depends on the number of indicators & observers and their complexity, but still, if the backtest takes, say, 5 hours with 215 stocks, should it be near 50 for 2150 stocks or not?

    I have 16 GB of RAM so that should not be an obstacle.

    Thanks in advance.



  • Number of data feeds which can be added to backtrader is unlimited.

    As you correctly noticed run time depends on the amount of features added. So best way to get answers is to run several tests with different number of data feeds and make some extrapolation.



  • I would probably measure separately run time for data feed adding and strategy processing. Might give you more accurate results.


  • administrators

    Let's see how many bar you have

    • Weeks: 52
    • Max Trading Days per week: 5
    • Max Trading Days Year: 260
    • Trading Holidays (approx): 8
    • Trading Days per Year: 252
    • Hours per Day: 8.5
    • Total Hours per Year: 2142
    • Total Years: 4
    • Total Hourly Bars per Stock: 8568
    • Number of Stocks: 215
    • Total Number of Bars: 1,842,120 (under 2 Million)

    It's unclear what you do but see please this:

    A complete trading run using pypy (the fastest interpreter and JIT) can be done in roughly 2.5 minutes for 2 Million bars (actual execution time is 2 minutes 36 seconds and 94 centiseconds in a test equipment which is by no means idle)



  • In that case, I must be doing something very wrong indeed.

    I suspect that it is my usage of the Indicators & Observers. Right now I am creating them separately for each data feed similarly to following:

    class MyStrategy(bt.Strategy):
    
      my_indicators = {}
    
      def __init__(self):
    
        for data in self.datas:
          symbol = data.p.dataname.split("\\")[1].split(".")[0]
          self.my_indicators[symbol] = MyIndicator(data)
    

    I create an Observer for each feed principally in the same way (in the main, ofc).

    It did not look quite right when I wrote it but with a small number of feeds, it did not matter much.

    Is this to blame and if yes, what is the right way to use the Indicators with multiple data feeds?


  • administrators

    @kriku said in Maximum number of data lines?:

    I create an Observer for each feed

    With 215 data feeds it seems really pointless to create Observers, which are meant for plotting.

    @kriku said in Maximum number of data lines?:

    self.my_indicators[symbol] = MyIndicator(data)

    There is no other way to use indicators But if your indicators calculate the trajectory of rockets from the Earth to Pluto on each iteration ... it's going to take time.

    The article shows that indicators are also created for each data feed (100 in the sample)

    Without code ... there is no diagnostic which is possible.



  • Thanks for the good tip.

    I finally got time to have a serious look at the issue. As I suspected, it had nothing to do with the Backtrader. pandas_market_calendars.get_calendar("NASDAQ").valid_days(start_date=start_dt, end_date=end_dt) costs as much as 0.14+ seconds for some odd reason. The script is a little more than 10 times faster now.

    That means, it takes about 44 seconds for one stock from 1.1.2010 to 1.1.2019. I reckon that it is still slow by your standards, but I do not think it is BT's fault in any way. I will try to find out more in the nearest future.



  • Data processing cost 4 years 1 stock hourly bars: 15.94 sec
    Without any indicators & instant return from next(): 6.03 sec
    With all indicators present & instant return from next(): 14.02 sec

    Therefore, the usage of 8 indicators costs about 8 secs in my case. Out of these, the most costly one is the StandardDeviation: 2.7 secs.

    NB! These figures are just indicative since I measured the performance only once.

    The code in the article referenced above uses just 2 moving averages as indicators. My script finishes in 6.36 secs when I run it with the following indicators only:

    mvav1 = bt.indicators.MovingAverageSimple(data, period=300)
    mvav2 = bt.indicators.MovingAverageSimple(data, period=600)
    

    The two moving averages together cost about half a sec or 16x less than the 8 indicators in the real code (and more than 5x less than the StandardDeviation).

    This was done with Python 3.7, NOT pypy. As the article says that using pypy could more than double performance, I am beginning to think that the performance of my script is presently close to normal.

    Does all this make more sense to you?



  • @kriku it is preload data that consume too much time. I use 1000 stocks about 1 minute data for 3 years,my 12 cpu and 64 G computer cannot run it,so,maybe you need to use some way to load data more clever.



  • Right now about 1/2 of the execution time is spent on the indicators and 1/8 on trading (next() and other things). I cannot see how faster data loading could influence that - although I agree with you that I need to load data more efficiently.



  • @kriku maybe your indicator time include load data time. calculate indicator may be very quickly,although I don't carefully read the indicator source code,because according to the author and cocument,the indicator caculate using vector, can be not using event driven.


  • administrators

    @tianjixuetu said in Maximum number of data lines?:

    and cocument,the indicator caculate using vector, can be not using event driven.

    As advised before, you probably want to STOP reading the source code and START reading the documentation. That statement is simply WRONG.

    @kriku said in Maximum number of data lines?:

    The two moving averages together cost about half a sec or 16x less than the 8 indicators in the real code (and more than 5x less than the StandardDeviation).

    @kriku said in Maximum number of data lines?:

    Does all this make more sense to you?

    Your indicators consume obviously a lot more time that 2 moving averages. Being the moving averages the simplest indicators, it does of course make sense.



  • @backtrader I need to say,read the documentation,it helps a lot,I often read it and have read many times.But,I need to solve some question,it maybe beyond the documentation.

    about this quesiton,my answer maybe right. as you writer in the documentation:

    Event and Vectorized
    The trading logic and the broker are always run on an event by event basis
    
    The calculation for indicators is vectorized if possible (source data can be preloaded)
    
    Everything can be run in event-only mode with no data preloaded, just like if things were live.
    

  • administrators

    @tianjixuetu said in Maximum number of data lines?:

    because according to the author and cocument,the indicator caculate using vector, can be not using event driven.

    This is what you said, that indicators cannot be calculated in event-driven mode.

    @tianjixuetu said in Maximum number of data lines?:

    The calculation for indicators is vectorized if possible (source data can be preloaded)

    Everything can be run in event-only mode with no data preloaded, just like if things were live.

    And the documentation clearly says that the default mode is "vectorized" if possible. If not, the indicators will be run in even-driven mode (as if things were live)



  • @backtrader Yes,you are right!


Log in to reply
 

});