Analyzer increases process as more data loaded, using only stop
-
Hi. I've developed an analyzer that uses a data list with 1700 stocks. Each stock has a pandas dataframe with information of his latest ticks OHLC and some indicators. All this information is created outside of cerebro.
At stop method I look for different conditions, as for example if today's close is above the close of three days ago.
As the length of each stock dataframe increases, the process time of the cerebro.run is worse (by seconds). I don't understand why does it happen if at stop I am iterating only the stocks, not each row of their pandas.
How can improve the process time? (I've tried to define 'next' with pass, ...).
The process is:
- For all symbols I've in a database, I load their ticks and compute his indicators
... df['rsi'] = btalib.rsi(df, period=14).df df['atr'] = btalib.atr(df).df df = df.tail(5) data_list.append((df, symbol))
- This datafeed is added at cerebro:
def execute_analysis(data_list, queries): start_time = timeit.default_timer() cerebro = bt.Cerebro() for i in range(len(self.data_list)): data = PandasData( dataname=data_list[i][0], # Pandas DataFrame name=data_list[i][1] # The symbol ) cerebro.adddata(data) cerebro.addanalyzer(ScreenerAnalyzer, _name="screener", queries=queries)
- The run analyzer has only defined the method 'stop':
class ScreenerAnalyzer(bt.Analyzer): def stop(self): print('{}: Results'.format(self.datas[0].datetime.date())) print('-'*80) self.rets = list() for i, d in enumerate(self.datas): ...
-
Hi, again. Sorry, but I am stuck. Anyone has any suggestion?
-
No sure I fully understand your confusion - but it is probably my confusion after all.
The basic thing is that
Cerebro.run
will iterate through all of your datas' dataframes anyway - regardless of whether or not you are implementingnext
orstop
in your strategy/analyzer or just leave the defaultnext
orstop
implementation (which does nothing btw) in the corresponding inherited classes. So if the size of the dataframe increases so will the time it takes to iterate it.Is it what you've been missing? or is it me that doesn't understand your question?
-
@vladisld My goal is to find stocks by technical indicators and other conditions (integrated in a React web app). My first implementation was to use a single cached pandas dataframe with one row for each stock. Each row had all the information needed for each stock (ATR, ATR(-1), ...). Each search lasted between 1 and 2 seconds.
As I discovered Backtrader I liked a lot his architecture so I migrated everything to it. Now the query lasts 7-8 seconds, but if I increase one day the information for each stock dataframe, 1 second is increased (more or less).
Are you telling then that all dataframes are iterated? I would like to create all information before run, so I can get information directly at stop.
Thanks a lot for your support.
-
Backtrader is working best for, well... back testing. So, neutralizing its main functionality in order to just iterate over all symbols just once seems to be sub optimal. I doubt it was designed for this scenario.
IMHO it is better to just use native pandas/numpy apsi for whatever algo you are using for symbol selection.
-
@vladisld OK, thanks for the explanation. I'll revert to my first solution with pandas.