@vladisld OK, thanks for the explanation. I'll revert to my first solution with pandas.

Posts made by Xavier Escudero
-
RE: Analyzer increases process as more data loaded, using only stop
-
RE: Analyzer increases process as more data loaded, using only stop
@vladisld My goal is to find stocks by technical indicators and other conditions (integrated in a React web app). My first implementation was to use a single cached pandas dataframe with one row for each stock. Each row had all the information needed for each stock (ATR, ATR(-1), ...). Each search lasted between 1 and 2 seconds.
As I discovered Backtrader I liked a lot his architecture so I migrated everything to it. Now the query lasts 7-8 seconds, but if I increase one day the information for each stock dataframe, 1 second is increased (more or less).
Are you telling then that all dataframes are iterated? I would like to create all information before run, so I can get information directly at stop.
Thanks a lot for your support.
-
RE: Analyzer increases process as more data loaded, using only stop
Hi, again. Sorry, but I am stuck. Anyone has any suggestion?
-
Analyzer increases process as more data loaded, using only stop
Hi. I've developed an analyzer that uses a data list with 1700 stocks. Each stock has a pandas dataframe with information of his latest ticks OHLC and some indicators. All this information is created outside of cerebro.
At stop method I look for different conditions, as for example if today's close is above the close of three days ago.
As the length of each stock dataframe increases, the process time of the cerebro.run is worse (by seconds). I don't understand why does it happen if at stop I am iterating only the stocks, not each row of their pandas.
How can improve the process time? (I've tried to define 'next' with pass, ...).
The process is:
- For all symbols I've in a database, I load their ticks and compute his indicators
... df['rsi'] = btalib.rsi(df, period=14).df df['atr'] = btalib.atr(df).df df = df.tail(5) data_list.append((df, symbol))
- This datafeed is added at cerebro:
def execute_analysis(data_list, queries): start_time = timeit.default_timer() cerebro = bt.Cerebro() for i in range(len(self.data_list)): data = PandasData( dataname=data_list[i][0], # Pandas DataFrame name=data_list[i][1] # The symbol ) cerebro.adddata(data) cerebro.addanalyzer(ScreenerAnalyzer, _name="screener", queries=queries)
- The run analyzer has only defined the method 'stop':
class ScreenerAnalyzer(bt.Analyzer): def stop(self): print('{}: Results'.format(self.datas[0].datetime.date())) print('-'*80) self.rets = list() for i, d in enumerate(self.datas): ...
-
RE: Crossover error operator '-'
@run-out Anyway, thanks for your support, it works!
-
RE: Crossover error operator '-'
@run-out Yes, now that you mention it I remember I saw the pull request, but I don't understand why it's not merged yet (it was commited on june).
-
RE: Crossover error operator '-'
@run-out Here you got it (last columns are sma20 and sma50):
o h l c v d s symbol max52w min52w sma20 sma50
ts
2020-02-18 85.110069 85.527370 83.987318 84.245651 2872500 0.0 0 A NaN NaN NaN NaN
2020-02-19 84.156212 85.288892 82.914237 84.802040 4741300 0.0 0 A NaN NaN NaN NaN
2020-02-20 84.354941 84.444359 82.824826 83.798531 2543600 0.0 0 A NaN NaN NaN NaN
2020-02-21 83.361355 84.543710 82.864565 84.523842 1762500 0.0 0 A NaN NaN NaN NaN
2020-02-24 81.692144 82.059763 79.506263 79.983185 2919200 0.0 0 A NaN NaN NaN NaN
... ... ... ... ... ... ... .. ... ... ... ... ...
2020-11-24 118.959999 118.959999 111.430000 114.680000 5559900 0.0 0 A NaN NaN 108.683499 104.917219
2020-11-25 116.089996 116.959999 113.410004 114.349998 2862500 0.0 0 A NaN NaN 109.317999 105.209415
2020-11-27 113.769997 114.980003 112.940002 114.089996 983700 0.0 0 A NaN NaN 109.921499 105.502801
2020-11-30 113.889999 117.070000 113.669998 116.900002 3778400 0.0 0 A NaN NaN 110.661999 105.856778
2020-12-01 117.580002 118.379997 115.040001 115.360001 1753700 0.0 0 A NaN NaN 111.167999 106.204910 -
Crossover error operator '-'
Using btalib crossover an error is shown:
Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/computation/expressions.py:200: UserWarning: evaluating in Python space because the '-' operator is not supported by numexpr/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/computation/expressions.py:200: UserWarning: evaluating in Python space because the '-' operator is not supported by numexpr for the bool dtype, use '^' instead
My code uses as input parameter a pandas dataframe:
if len(df)>50: sma50 = btalib.sma(df, period=50) df['sma50'] = btalib.sma(df, period=50).df if len(df)>200: sma200 = btalib.sma(df, period=200) df['sma200'] = sma200.df btalib.crossover(sma50, sma200)
-
RE: Crossover + MACD error
@SPECULARI-FUND I've the same problem. Have you any solution?
-
RE: Cerebro run spends 20 seconds without any analyzer
@Xavier-Escudero said in Cerebro run spends 20 seconds without any analyzer:
@run-out Thanks. I've tried but the next error is shown:
clslines = baselines + lines TypeError: can only concatenate tuple (not "str") to tuple
It seems the error appears when I declare the lines field:
class PandasData(btfeed.PandasData): ''' The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas DataFrame ''' lines = ('sma') params = ( ('nullvalue', 0.0), # Possible values for datetime (must always be present) # None : datetime is the "index" in the Pandas Dataframe # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('datetime', None), # Possible values below: # None : column not present # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('open', 'o'), ('high', 'h'), ('low', 'l'), ('close', 'c'), ('volume', 'v'), ('openinterest', None), ('sma', -1) ) datafields = btfeed.PandasData.datafields + ( [ 'sma' ] )
And I am creating the data as:
df = pd.DataFrame.from_records(ticks) df = df.join(btalib.sma(df, period=20).df)
The data shows that everything is ok, all are floats or NaN (for example first values of SMA), so I don't understand the meaning of error.
Thanks again.
New: It works adding ',' at the end of lines:
lines = ('sma',)
-
RE: Cerebro run spends 20 seconds without any analyzer
@run-out Thanks. I've tried but the next error is shown:
clslines = baselines + lines TypeError: can only concatenate tuple (not "str") to tuple
It seems the error appears when I declare the lines field:
class PandasData(btfeed.PandasData): ''' The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas DataFrame ''' lines = ('sma') params = ( ('nullvalue', 0.0), # Possible values for datetime (must always be present) # None : datetime is the "index" in the Pandas Dataframe # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('datetime', None), # Possible values below: # None : column not present # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('open', 'o'), ('high', 'h'), ('low', 'l'), ('close', 'c'), ('volume', 'v'), ('openinterest', None), ('sma', -1) ) datafields = btfeed.PandasData.datafields + ( [ 'sma' ] )
And I am creating the data as:
df = pd.DataFrame.from_records(ticks) df = df.join(btalib.sma(df, period=20).df)
The data shows that everything is ok, all are floats or NaN (for example first values of SMA), so I don't understand the meaning of error.
Thanks again.
-
RE: Cerebro run spends 20 seconds without any analyzer
@run-out Great! Have you got any example to use with?
I've seen that bta-lib generates lines, but you can get as well his pandas data.
sma = btalib.sma(df, period=20) df = sma.df
I am feeding the cerebro 'addData' with a pandas data frame that has 'ohlcv'.
-
Do I need to join in one pandas dataframe information about ohlcv and one new column from sma? Or the lines need/can be passed aside?
-
I am not sure then if I need to adapt as well my PandasData class (see below)
Thanks in advance for your help!
cerebro = bt.Cerebro() for i in range(len(self.data_list)): data = PandasData( dataname=self.data_list[i][0], # Pandas DataFrame name=self.data_list[i][1] # The symbol ) cerebro.adddata(data)
class PandasData(btfeed.PandasData): ''' The ``dataname`` parameter inherited from ``feed.DataBase`` is the pandas DataFrame ''' params = ( ('nullvalue', 0.0), # Possible values for datetime (must always be present) # None : datetime is the "index" in the Pandas Dataframe # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('datetime', None), # Possible values below: # None : column not present # -1 : autodetect position or case-wise equal name # >= 0 : numeric index to the colum in the pandas dataframe # string : column name (as index) in the pandas dataframe ('open', 'o'), ('high', 'h'), ('low', 'l'), ('close', 'c'), ('volume', 'v'), ('openinterest', None), )
-
-
RE: Cerebro run spends 20 seconds without any analyzer
@vladisld I've checked that 16 seconds of the time is spent in init, creating the indicators for all stocks (more than 1400).
My understanding, related to python init is that it was only called once, and then the same object can be called with different analysis, but it's not really an object. There's any way to load the data feed and create indicators outside the strategy/analyzer, and then execute analyzers/strategy with these data?
-
Strategy or Analyzer init - Avoid re-execution if used again
Hi. I want to store in memory data of several stocks data with computed indicators, so they are reused in several executions.
I've tried to use the method init in one analyzer/strategy, but everytime I run the cerebro the 'init' method is started again, and it spends a lot of time (15 seconds with two indicators, and 1440 stocks. I've read in posts that cerebro can't be reused.
There's anyway to load the indicators once, and reuse them for every execution?
class ScreenerAnalyzer(bt.Analyzer): params = dict(period=10) def __init__(self): print('INIT----------') self.rets['over'] = list() self.rets['under'] = list() self.smas = {data: bt.indicators.SMA(data, period=self.p.period) for data in self.datas} self.atrs = {data: bt.indicators.ATR(data, period=14) for data in self.datas}
def execute_analysis(self): cerebro = bt.Cerebro() for i in range(len(self.data_list)): data = PandasData( dataname=self.data_list[i][0], # Pandas DataFrame name=self.data_list[i][1] # The symbol ) cerebro.adddata(data) cerebro.addanalyzer(ScreenerAnalyzer, period=10) start_time = timeit.default_timer() cerebro.run(runonce=False, stdstats=False, writer=False) elapsed = timeit.default_timer() - start_time print(f'analysis: {elapsed}')
-
RE: Cerebro run spends 20 seconds without any analyzer
@vladisld 20 seconds is the time printed for run, not for the loading of the data
-
Cerebro run spends 20 seconds without any analyzer
Hi. I want to create a screener for filtering more than 2000 stocks, based on some dynamic parameters, entered by the user. I've created two classes:
- ScreenerStrategy. The init method loads all data, and create indicators
- ScreenerAnalyzer. The stop method applies filters to create results
The execution lasts more than 20 seconds. I had an already implemented solution using pandas with a response time of mostly 2 seconds, but I want a better architecture, so I am trying to migrate it to backtrader.
I've seen a curious thing that maybe you can help me.
If I comment the 'addanalyzer' line the cerebro.run lasts the same, so I think that cerebro is doing a lot of things outside of the analysis that I don't need. I've used all possible parameters for cerebro but no change. Can you help me?
class ScreenerStrategy(bt.Strategy): def __init__(self): self.inds = dict() self.inds['RSI'] = dict() self.inds['SMA'] = dict() for i, d in enumerate(self.datas): # For each indicator we want to track it's value and whether it is # bullish or bearish. We can do this by creating a new line that returns # true or false. # RSI self.inds['RSI'][d._name] = dict() self.inds['RSI'][d._name]['value'] = bt.indicators.RSI(d, period=14, safediv=True) self.inds['RSI'][d._name]['bullish'] = self.inds['RSI'][d._name]['value'] > 50 self.inds['RSI'][d._name]['bearish'] = self.inds['RSI'][d._name]['value'] < 50 # SMA self.inds['SMA'][d._name] = dict() self.inds['SMA'][d._name]['value'] = bt.indicators.SMA(d, period=20) self.inds['SMA'][d._name]['bullish'] = d.close > self.inds['SMA'][d._name]['value'] self.inds['SMA'][d._name]['bearish'] = d.close < self.inds['SMA'][d._name]['value']
class ScreenerAnalyzer(bt.Analyzer): params = dict(period=10) def stop(self): print('-'*80) results = dict() for key, value in self.strategy.inds.items(): results[key] = list() for nested_key, nested_value in value.items(): ...
Execution:
cerebro = bt.Cerebro(runonce=True, stdstats=False, # Remove observers writer=False, optdatas=False, optreturn=False ) cerebro.addstrategy(ScreenerStrategy) for i in range(len(self.data_list)): data = PandasData( dataname=self.data_list[i][0], # Pandas DataFrame name=self.data_list[i][1] # The symbol ) cerebro.adddata(data) #cerebro.addanalyzer(ScreenerAnalyzer) cerebro.run()