For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Performance benchmark: 10000 data feed back testing



  • I loaded 10405 data feed for backtesting with fundamental data. I am trying to have access to all data feed at the same time so that I can update/change my stock portfolio in the strategy part of the code.

    I filtered the data to include just the fundamental data SEC release date and not daily data as shown below

    selected_daily_columns       = ['open', 'high', 'low', 'close', 'volume']
    selected_fundamental_columns = ["debt","cashneq","equity","de"]
    
    custom_params = [('datetime', "date"), ('openinterest',None)]
    custom_params += [(name, name) for name in selected_daily_columns]
    custom_params += [(name, name) for name in selected_fundamental_columns]
    
    class PandasCustomData(bt.feeds.PandasData):
        lines = tuple(selected_fundamental_columns)
        params = tuple(custom_params)
    

    I have powerful machine (like 1.5 TB RAM and 56 core).
    I am trying to run an empty strategy code to if

    class TestStrategy(bt.Strategy):
    
        def log(self, txt, dt=None):
            ''' Logging function for this strategy'''
            dt = dt or self.datas[0].datetime.date(0)
    #         dt = dt or self.datetime.date()
            print('%s, %s' % (dt.isoformat(), txt))
    
        def __init__(self):
            # Keep a reference to the "close" line in the data[0] dataseries
            pass
    
    
        def next(self):
            # Simply log the closing price of the series from the reference
            msg = ""
            for tickeri,ticker in ticker_data_feedi.items():
                msg += f' ;{ticker} Close:{self.datas[tickeri].close[0]:.2f}'
                msg += "" if pd.isna(self.datas[tickeri].equity[0]) else f' Equity:{self.datas[tickeri].equity[0]}'
                msg += "" if pd.isna(self.datas[tickeri].de[0]) else f' DE:{round(self.datas[tickeri].de[0], 6)}'
    
    #         self.log( msg )
    

    I set up the benchmark for cerebro runs on log2 data feed sizes.
    How can make this faster. I am using all cpus

    benchmark = pd.DataFrame({'nfeeds' : [], "time_taken":[]})
    
    for nfeed in [2**i for i in range(15)]:
        
        bd = {"nfeed" :  nfeed}
        start_time = time.time()
        
        try:
            del cerebro
        except NameError:
            pass
    
        cerebro = bt.Cerebro(maxcpus=42)
    
        cerebro.addstrategy(TestStrategy)
        cerebro.addanalyzer(bt.analyzers.SharpeRatio, _name='mysharpe')
    
        ticker_data_feedi = {}
        for i, (feed_name, pd_data_feed) in enumerate(pd_data_feeds):
            cerebro.adddata(pd_data_feed)
            ticker_data_feedi[i] = feed_name
            if i == nfeed:
                break
    
        # cerebro.addsizer(ProportionalSizer)
        cerebro.broker.setcash(10000.0)
        cerebro.broker.setcommission(commission=0.001)
        strategies = cerebro.run()
        
        end_time = time.time()
        bd["time_taken"] = end_time - start_time
        print(nfeed, end_time - start_time)
        
        benchmark.append(bd, ignore_index=True)
        del ticker_data_feedi; gc.collect()
    

    For 10k tickers with dataframe of size 471656, it took around 21 minutes.
    I cannot see all my cores active when i use the maxcpus option as above.
    Is there anything that needs to be set to get all cores active ?


Log in to reply
 

});