For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/
Performance benchmark: 10000 data feed back testing
-
I loaded 10405 data feed for backtesting with fundamental data. I am trying to have access to all data feed at the same time so that I can update/change my stock portfolio in the strategy part of the code.
I filtered the data to include just the fundamental data SEC release date and not daily data as shown below
selected_daily_columns = ['open', 'high', 'low', 'close', 'volume'] selected_fundamental_columns = ["debt","cashneq","equity","de"] custom_params = [('datetime', "date"), ('openinterest',None)] custom_params += [(name, name) for name in selected_daily_columns] custom_params += [(name, name) for name in selected_fundamental_columns] class PandasCustomData(bt.feeds.PandasData): lines = tuple(selected_fundamental_columns) params = tuple(custom_params)
I have powerful machine (like 1.5 TB RAM and 56 core).
I am trying to run an empty strategy code to ifclass TestStrategy(bt.Strategy): def log(self, txt, dt=None): ''' Logging function for this strategy''' dt = dt or self.datas[0].datetime.date(0) # dt = dt or self.datetime.date() print('%s, %s' % (dt.isoformat(), txt)) def __init__(self): # Keep a reference to the "close" line in the data[0] dataseries pass def next(self): # Simply log the closing price of the series from the reference msg = "" for tickeri,ticker in ticker_data_feedi.items(): msg += f' ;{ticker} Close:{self.datas[tickeri].close[0]:.2f}' msg += "" if pd.isna(self.datas[tickeri].equity[0]) else f' Equity:{self.datas[tickeri].equity[0]}' msg += "" if pd.isna(self.datas[tickeri].de[0]) else f' DE:{round(self.datas[tickeri].de[0], 6)}' # self.log( msg )
I set up the benchmark for cerebro runs on log2 data feed sizes.
How can make this faster. I am using all cpusbenchmark = pd.DataFrame({'nfeeds' : [], "time_taken":[]}) for nfeed in [2**i for i in range(15)]: bd = {"nfeed" : nfeed} start_time = time.time() try: del cerebro except NameError: pass cerebro = bt.Cerebro(maxcpus=42) cerebro.addstrategy(TestStrategy) cerebro.addanalyzer(bt.analyzers.SharpeRatio, _name='mysharpe') ticker_data_feedi = {} for i, (feed_name, pd_data_feed) in enumerate(pd_data_feeds): cerebro.adddata(pd_data_feed) ticker_data_feedi[i] = feed_name if i == nfeed: break # cerebro.addsizer(ProportionalSizer) cerebro.broker.setcash(10000.0) cerebro.broker.setcommission(commission=0.001) strategies = cerebro.run() end_time = time.time() bd["time_taken"] = end_time - start_time print(nfeed, end_time - start_time) benchmark.append(bd, ignore_index=True) del ticker_data_feedi; gc.collect()
For 10k tickers with dataframe of size
471656
, it took around 21 minutes.
I cannot see all my cores active when i use themaxcpus
option as above.
Is there anything that needs to be set to get all cores active ?