For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Cerebro run spends 20 seconds without any analyzer



  • Hi. I want to create a screener for filtering more than 2000 stocks, based on some dynamic parameters, entered by the user. I've created two classes:

    • ScreenerStrategy. The init method loads all data, and create indicators
    • ScreenerAnalyzer. The stop method applies filters to create results

    The execution lasts more than 20 seconds. I had an already implemented solution using pandas with a response time of mostly 2 seconds, but I want a better architecture, so I am trying to migrate it to backtrader.

    I've seen a curious thing that maybe you can help me.

    If I comment the 'addanalyzer' line the cerebro.run lasts the same, so I think that cerebro is doing a lot of things outside of the analysis that I don't need. I've used all possible parameters for cerebro but no change. Can you help me?

    class ScreenerStrategy(bt.Strategy):
        def __init__(self):
            self.inds = dict()
            self.inds['RSI'] = dict()
            self.inds['SMA'] = dict()
    
            for i, d in enumerate(self.datas):
    
                # For each indicator we want to track it's value and whether it is
                # bullish or bearish. We can do this by creating a new line that returns
                # true or false.
    
                # RSI
                self.inds['RSI'][d._name] = dict()
                self.inds['RSI'][d._name]['value']  = bt.indicators.RSI(d, period=14, safediv=True)
                self.inds['RSI'][d._name]['bullish'] = self.inds['RSI'][d._name]['value']  > 50
                self.inds['RSI'][d._name]['bearish'] = self.inds['RSI'][d._name]['value']  < 50
    
                # SMA
                self.inds['SMA'][d._name] = dict()
                self.inds['SMA'][d._name]['value']  = bt.indicators.SMA(d, period=20)
                self.inds['SMA'][d._name]['bullish'] = d.close > self.inds['SMA'][d._name]['value']
                self.inds['SMA'][d._name]['bearish'] = d.close < self.inds['SMA'][d._name]['value']
    
    class ScreenerAnalyzer(bt.Analyzer):
        params = dict(period=10)
        
        def stop(self):
            print('-'*80)
            results = dict()
                   for key, value in self.strategy.inds.items():
                       results[key] = list()
           
                       for nested_key, nested_value in value.items():
                  ...
    
    

    Execution:

            cerebro = bt.Cerebro(runonce=True,                         
                             stdstats=False, # Remove observers
                             writer=False,                         
                             optdatas=False,
                             optreturn=False
                             )
    
            cerebro.addstrategy(ScreenerStrategy)
    
            for i in range(len(self.data_list)):            
                data = PandasData(
                    dataname=self.data_list[i][0], # Pandas DataFrame
                    name=self.data_list[i][1] # The symbol
                    )            
                cerebro.adddata(data)
           
           #cerebro.addanalyzer(ScreenerAnalyzer)
           cerebro.run()
    


  • A wild guess may be that most of the time is wasted loading the data. I would measure it with timeit:

           start_time = timeit.default_timer()
           for i in range(len(self.data_list)):            
               data = PandasData(
                    dataname=self.data_list[i][0], # Pandas DataFrame
                    name=self.data_list[i][1] # The symbol
                    )            
               cerebro.adddata(data)
           elapsed = timeit.default_timer() - start_time
           print(f'loading data: {elapsed}')       
    
    
           #cerebro.addanalyzer(ScreenerAnalyzer)
           start_time = timeit.default_timer()
           cerebro.run()
           elapsed = timeit.default_timer() - start_time
           print(f'run: {elapsed}')       
    


  • @vladisld 20 seconds is the time printed for run, not for the loading of the data



  • @vladisld I've checked that 16 seconds of the time is spent in init, creating the indicators for all stocks (more than 1400).

    My understanding, related to python init is that it was only called once, and then the same object can be called with different analysis, but it's not really an object. There's any way to load the data feed and create indicators outside the strategy/analyzer, and then execute analyzers/strategy with these data?



  • @Xavier-Escudero said in Cerebro run spends 20 seconds without any analyzer:

    The execution lasts more than 20 seconds. I had an already implemented solution using pandas with a response time of mostly 2 seconds, but I want a better architecture, so I am trying to migrate it to backtrader.

    My inclination would be to do the heavy lifting using pandas before you load each datas into cerebro. You can use bta-lib for the calculations, and create a line in your data for each of your RSI and SMA lines, including value/bullish/bearish indicators.

    Pandas with bta-lib should solve the time problem.



  • @run-out Great! Have you got any example to use with?

    I've seen that bta-lib generates lines, but you can get as well his pandas data.

    sma = btalib.sma(df, period=20)
    df = sma.df
    

    I am feeding the cerebro 'addData' with a pandas data frame that has 'ohlcv'.

    1. Do I need to join in one pandas dataframe information about ohlcv and one new column from sma? Or the lines need/can be passed aside?

    2. I am not sure then if I need to adapt as well my PandasData class (see below)

    Thanks in advance for your help!

     cerebro = bt.Cerebro()
     for i in range(len(self.data_list)):            
          data = PandasData(
               dataname=self.data_list[i][0], # Pandas DataFrame
               name=self.data_list[i][1] # The symbol
          )            
          cerebro.adddata(data)
    
    class PandasData(btfeed.PandasData):
        '''
        The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas
        DataFrame
        '''
    
        params = (
            ('nullvalue', 0.0),
            # Possible values for datetime (must always be present)
            #  None : datetime is the "index" in the Pandas Dataframe
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('datetime', None),
    
            # Possible values below:
            #  None : column not present
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('open', 'o'),
            ('high', 'h'),
            ('low', 'l'),
            ('close', 'c'),
            ('volume', 'v'),
            ('openinterest', None),
        )
    


  • @Xavier-Escudero said in Cerebro run spends 20 seconds without any analyzer:

    1. Do I need to join in one pandas dataframe information about ohlcv and one new column from sma? Or the lines need/can be passed aside?

    Yes, I would put the bta-lib lines beside the ohlcv in the same dataframe. See here for some examples:

    https://community.backtrader.com/topic/2971/multiple-assets-with-custom-pandas-dataframe/2

    https://community.backtrader.com/topic/2428/create-indicator-line-from-dataframe-not-from-data-in-cerebros/5



  • @run-out Thanks. I've tried but the next error is shown:

    clslines = baselines + lines
    TypeError: can only concatenate tuple (not "str") to tuple
    

    It seems the error appears when I declare the lines field:

    class PandasData(btfeed.PandasData):
        '''
        The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas
        DataFrame
        '''    
        lines = ('sma')
        
        params = (
            ('nullvalue', 0.0),
            # Possible values for datetime (must always be present)
            #  None : datetime is the "index" in the Pandas Dataframe
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('datetime', None),
    
            # Possible values below:
            #  None : column not present
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('open', 'o'),
            ('high', 'h'),
            ('low', 'l'),
            ('close', 'c'),
            ('volume', 'v'),
            ('openinterest', None),
            ('sma', -1)        
        )
        
        datafields = btfeed.PandasData.datafields + (
            [
                'sma'
            ]
        )    
    

    And I am creating the data as:

      df = pd.DataFrame.from_records(ticks) 
      df = df.join(btalib.sma(df, period=20).df)
    

    The data shows that everything is ok, all are floats or NaN (for example first values of SMA), so I don't understand the meaning of error.

    Thanks again.



  • @Xavier-Escudero said in Cerebro run spends 20 seconds without any analyzer:

    @run-out Thanks. I've tried but the next error is shown:

    clslines = baselines + lines
    TypeError: can only concatenate tuple (not "str") to tuple
    

    It seems the error appears when I declare the lines field:

    class PandasData(btfeed.PandasData):
        '''
        The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas
        DataFrame
        '''    
        lines = ('sma')
        
        params = (
            ('nullvalue', 0.0),
            # Possible values for datetime (must always be present)
            #  None : datetime is the "index" in the Pandas Dataframe
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('datetime', None),
    
            # Possible values below:
            #  None : column not present
            #  -1 : autodetect position or case-wise equal name
            #  >= 0 : numeric index to the colum in the pandas dataframe
            #  string : column name (as index) in the pandas dataframe
            ('open', 'o'),
            ('high', 'h'),
            ('low', 'l'),
            ('close', 'c'),
            ('volume', 'v'),
            ('openinterest', None),
            ('sma', -1)        
        )
        
        datafields = btfeed.PandasData.datafields + (
            [
                'sma'
            ]
        )    
    

    And I am creating the data as:

      df = pd.DataFrame.from_records(ticks) 
      df = df.join(btalib.sma(df, period=20).df)
    

    The data shows that everything is ok, all are floats or NaN (for example first values of SMA), so I don't understand the meaning of error.

    Thanks again.

    New: It works adding ',' at the end of lines:

    lines = ('sma',)
    


  • @Xavier-Escudero said in Cerebro run spends 20 seconds without any analyzer:

    New: It works adding ',' at the end of lines:
    lines = ('sma',)

    Adding the comma turns the right side into a tuple.


Log in to reply
 

});