Multicore optimization not work for parallel multistrategy scenario
-
After I study the /backtrader/samples/optimization/optimization.py, I realize that the strategys in the sample is all independent, and each strategy can be executed to the end of time series then execute another strategy. Here is the code example:
class UserStockStrategy(bt.Strategy): def next(self): print("in next:", self.p.instance, os.getpid(), date) cerebro = bt.Cerebro(maxcpus=1, runonce=not args.no_runonce, preload=True, optdatas=not args.no_optdatas, optreturn=not args.no_optreturn) cerebro.optstrategy( UserStockStrategy, instance=range(0, 10), #iterstrats ) ###########output################## in next: 0 24170 1996-05-30 23:59:59.999989 in next: 0 24170 1996-05-31 23:59:59.999989 in next: 0 24170 1996-06-03 23:59:59.999989 in next: 0 24170 1996-06-04 23:59:59.999989 in next: 0 24170 1996-06-05 23:59:59.999989 in next: 0 24170 1996-06-06 23:59:59.999989 in next: 1 24170 1996-05-30 23:59:59.999989 in next: 2 24170 1996-05-31 23:59:59.999989 in next: 3 24170 1996-06-03 23:59:59.999989 in next: 4 24170 1996-06-04 23:59:59.999989 in next: 5 24170 1996-06-05 23:59:59.999989 in next: 6 24170 1996-06-06 23:59:59.999989
If I code in multistratey way, all strategy execute in the same time, and all the strategy next function will be processed one by one in same process, so the multicore optimization doesn't work anymore. Here is the code example:
class UserStockStrategy(bt.Strategy): def next(self): print("in next:", self.p.instance, os.getpid(), date) cerebro = bt.Cerebro(maxcpus=10) for i in range(10): cerebro.optstrategy(UserStockStrategy, instance=i) ###########output################## in next: 0 24119 1996-05-30 23:59:59.999989 in next: 1 24119 1996-05-30 23:59:59.999989 in next: 2 24119 1996-05-30 23:59:59.999989 in next: 3 24119 1996-05-30 23:59:59.999989 in next: 4 24119 1996-05-30 23:59:59.999989 in next: 5 24119 1996-05-30 23:59:59.999989 in next: 6 24119 1996-05-30 23:59:59.999989 in next: 7 24119 1996-05-30 23:59:59.999989 in next: 8 24119 1996-05-30 23:59:59.999989 in next: 9 24119 1996-05-30 23:59:59.999989 in next: 0 24119 1996-05-31 23:59:59.999989 in next: 1 24119 1996-05-31 23:59:59.999989 in next: 2 24119 1996-05-31 23:59:59.999989 in next: 3 24119 1996-05-31 23:59:59.999989 in next: 4 24119 1996-05-31 23:59:59.999989 in next: 5 24119 1996-05-31 23:59:59.999989 in next: 6 24119 1996-05-31 23:59:59.999989 in next: 7 24119 1996-05-31 23:59:59.999989 in next: 8 24119 1996-05-31 23:59:59.999989 in next: 9 24119 1996-05-31 23:59:59.999989 in next: 0 24119 1996-06-03 23:59:59.999989 in next: 1 24119 1996-06-03 23:59:59.999989 in next: 2 24119 1996-06-03 23:59:59.999989
We can see all strategy share the same process.
My question is:- Is my unstanding about multicore optimization correct?
- Is there any way to run multistrategy in multiprocess? for example, each process only execute one strategy, but works in synchronous and parallel way.
-
@yacc2000 I think you want the strategy selection/fetcher pattern. I've been using a similar/modified version of it for the same reason and working well.
Link to the updated pattern https://www.backtrader.com/blog/posts/2017-05-16-stsel-revisited/stsel-revisited/
The original post is good to read for details https://www.backtrader.com/blog/posts/2016-10-29-strategy-selection/strategy-selection/
-
@crazy25000 Sorry, I might not clearly express what I mean.
My use case is like this one:
https://community.backtrader.com/topic/1337/running-multiple-strategies-combining-the-outputIn my use case, there are several strategy in one cerebro, each strategy trade one asset at same time.
Could this use case be executed by multiprocess to speed up?
-
@yacc2000 can you provide a full minimal, reproducible example that I can run? It would be helpful to understand it better - I think I know what you're asking, but would be helpful to reproduce: multi-strategies + multiple assets and what you expect vs results.
-
@crazy25000
Here is the code:from __future__ import (absolute_import, division, print_function, unicode_literals) import argparse import datetime import time import os from backtrader.utils.py3 import range import backtrader as bt import backtrader.indicators as btind import backtrader.feeds as btfeeds class OptimizeStrategy(bt.Strategy): params = (('smaperiod', 15), ('macdperiod1', 12), ('macdperiod2', 26), ('macdperiod3', 9), ('instance', 0), ) def __init__(self): # Add indicators to add load print("instance:", self.p.instance, os.getppid()) btind.SMA(period=self.p.smaperiod) self.ind = btind.MACD(self.datas[self.p.instance], period_me1=self.p.macdperiod1, period_me2=self.p.macdperiod2, period_signal=self.p.macdperiod3) def next(self): if self.ind > 1: self.buy(self.datas[self.p.instance]) else: self.sell(self.datas[self.p.instance]) def runstrat(): # Create a cerebro entity cerebro = bt.Cerebro(maxcpus=10) #not args.no_optreturn) # Add a strategy for i in range(10): cerebro.optstrategy( OptimizeStrategy, instance=i ) # Create the 1st data for i in range(10): #could be different csv file per data data = btfeeds.BacktraderCSVData(dataname='../../datas/2006-day-001.txt') # Add the Data Feed to Cerebro cerebro.adddata(data) # clock the start of the process tstart = time.clock() # Run over everything stratruns = cerebro.run() # clock the end of the process tend = time.clock() print('==================================================') for stratrun in stratruns: print('**************************************************') for strat in stratrun: print('--------------------------------------------------') print(strat.p._getkwargs()) print('==================================================') # print out the result print('Time used:', str(tend - tstart)) if __name__ == '__main__': runstrat()
Thanks
-
@yacc2000 multiprocessing would speed up that pipeline as long as the strategy is complex enough and needs to do a lot of calculations. If it's a really simple strategy like the example you posted, multiprocessing would not speed it up since no need for more calculations.
Here's an example you can test and verify:
import os from datetime import datetime import backtrader as bt class OptimizeStrategy(bt.Strategy): params = ( ('p1', 15), ('p2', 12), ('instance', 0), ) def __init__(self): self.ema1 = bt.talib.EMA(self.data, timeperiod=self.p.p1, plotname='EMA1') self.ema2 = bt.talib.EMA(self.data, timeperiod=self.p.p2, plotname='EMA2') self.crossover = bt.indicators.CrossOver(self.ema1, self.ema2) def next(self): if self.crossover > 0: if self.position: self.close() self.buy(self.datas[self.p.instance]) elif self.crossover < 0: if self.position: self.close() self.sell(self.datas[self.p.instance]) def bt_opt_callback(cb): pbar.update() def runstrat(): cerebro = bt.Cerebro(maxcpus=15) for i in range(5): cerebro.optstrategy(OptimizeStrategy, instance=i, p1=range(9, 55, 10), p2=200) for i in range(5): data = bt.feeds.YahooFinanceData(dataname='AAPL', fromdate=datetime(2016, 1, 1), todate=datetime(2017, 1, 1)) cerebro.adddata(data) stratruns = cerebro.run() if __name__ == '__main__': runstrat()
I updated what you posted so that it can run calculations with indicators and buy, sell, close:
- Test 1 range=5, maxcpus=5: 90 seconds
- Test 2 range=5, maxcpus=15: 47 seconds
-
@crazy25000 Thank you very much.
Yes, you are right.
But if you set maxcpus=1, you will see the strategy is executed one by one, which mean each strategy are all independent, the output is like this:
in next: 0 24170 1996-05-30 23:59:59.999989 in next: 0 24170 1996-05-31 23:59:59.999989 in next: 0 24170 1996-06-03 23:59:59.999989 in next: 0 24170 1996-06-04 23:59:59.999989 in next: 0 24170 1996-06-05 23:59:59.999989 in next: 0 24170 1996-06-06 23:59:59.999989 in next: 1 24170 1996-05-30 23:59:59.999989 #strategy 1 is executed after strategy 0 finished
And in my case, all strategy are working in parallel, trading on different asset, but using same account, with same broker, at same time. the output is like this:
in next: 0 24119 1996-05-30 23:59:59.999989 in next: 1 24119 1996-05-30 23:59:59.999989 in next: 2 24119 1996-05-30 23:59:59.999989 in next: 3 24119 1996-05-30 23:59:59.999989 in next: 4 24119 1996-05-30 23:59:59.999989 in next: 5 24119 1996-05-30 23:59:59.999989 in next: 6 24119 1996-05-30 23:59:59.999989 in next: 7 24119 1996-05-30 23:59:59.999989 in next: 8 24119 1996-05-30 23:59:59.999989 in next: 9 24119 1996-05-30 23:59:59.999989 in next: 0 24119 1996-05-31 23:59:59.999989 in next: 1 24119 1996-05-31 23:59:59.999989 #all strategy finish on 1996-05-30, then carry on the next day
Could my use case be speed up by multicore?
Thanks
-
@yacc2000 said in Multicore optimization not work for parallel multistrategy scenario:
in my case, all strategy are working in parallel, trading on different asset, but using same account, with same broker, at same time
This is not what is happening in the code you've provided. When optimizing ( doesn't matter if using multiprocessing on multiple cores or single process using single core), a separate Cerebro instance will be used for each permutation of optimized strategies. In multiprocess case a cerebro instance will cloned ( pickelized ) to each working process, where in single core the same Cerebro instance will be reset before running each strategy.
In either case a separate broker will be used for each run.
Using print to the screen is probably misleading in multiprocessing environment since it is synchronized internally within python environment ( there is a mutex for system stdout ) - try to either use the callback mechanism:
cerebro.optcallback(optimization_callback)
or just write the log to the separate files.
-
@vladisld thanks for catching that! Sounds like I wrongly assumed. I assumed he would pass the results of the optimizations to a central broker, another instance that decides what to do.
-
@vladisld Thanks, perhaps I misunderstand the usage of optstragegy, maybe I should use addstrategy instead.
What I really want is that multi-strategy are working in parallel, trading on different asset, but using same account, with same broker, at same time. Maybe I should code in this way using addstrategy instead:
from __future__ import (absolute_import, division, print_function, unicode_literals) import argparse import datetime import time import os from backtrader.utils.py3 import range import backtrader as bt import backtrader.indicators as btind import backtrader.feeds as btfeeds class OptimizeStrategy(bt.Strategy): params = (('smaperiod', 15), ('macdperiod1', 12), ('macdperiod2', 26), ('macdperiod3', 9), ('instance', 0), ) def __init__(self): # Add indicators to add load print("instance:", self.p.instance, os.getppid()) btind.SMA(period=self.p.smaperiod) self.ind = btind.MACD(self.datas[self.p.instance], period_me1=self.p.macdperiod1, period_me2=self.p.macdperiod2, period_signal=self.p.macdperiod3) def next(self): if self.ind > 1: self.buy(self.datas[self.p.instance]) else: self.sell(self.datas[self.p.instance]) def runstrat(): # Create a cerebro entity cerebro = bt.Cerebro(maxcpus=10) #not args.no_optreturn) # Add a strategy for i in range(10): cerebro.addstrategy( OptimizeStrategy, instance=i ) ##!!!!!!!!!!!!!!!! # Create the 1st data for i in range(10): #could be different csv file per data data = btfeeds.BacktraderCSVData(dataname='../../datas/2006-day-001.txt') # Add the Data Feed to Cerebro cerebro.adddata(data) # clock the start of the process tstart = time.clock() # Run over everything stratruns = cerebro.run() # clock the end of the process tend = time.clock() print('==================================================') for stratrun in stratruns: print('**************************************************') for strat in stratrun: print('--------------------------------------------------') print(strat.p._getkwargs()) print('==================================================') # print out the result print('Time used:', str(tend - tstart)) if __name__ == '__main__': runstrat()
BTW, why doesn't numpylines merge back to master branch. Does numpylines helps to improve backtrader performance?
Thanks
-
@yacc2000 In my system I'm also using multiple strategies where each one is running on its own data feed - although I'm using this setup only for live/paper runs.
For optimizations, I'm usually running each data feed separately. It doesn't mean it is impossible to do otherwise - it just more practical this way since there is no relation between different data feeds ( not pair trading, no correlation trading, no nothing in my case ).
If you still insist of optimizing multiple data feeds against the same account, one way is to have a single backtrader strategy to handle multiple data feeds, where the a separate "logic" strategy objects (not specifically inherited from backtrader's
Strategy
class) is allocated for each data feed. Here the (very simplifies) sample code from my system doing it:class TestStrategy: def __init__(self, parent, data): # assumption is that there will be only a single data feed per TestStrategy self.parent = parent self.data_feed = data def start(self): pass def stop(self): pass def notify_store(self, msg, *args, **kwargs): pass def notify_order(self, order): pass def notify_trade(self, trade): pass def notify_timer(self, timer, when, *args, **kwargs): pass def next(self): pass def prenext(self): pass class MultiStrategyWrapper(bt.Strategy): params = ( <whatever params you need> ) def __init__(self): self.strategies = [] for data in self.datas: strategy = TestStrategy(self, data) self.strategies.append(strategy) def start(self): for strat in self.strategies: strat.start() def stop(self): for strat in self.strategies: strat.stop() def notify_timer(self, timer, when, *args, **kwargs): for strat in self.strategies: if timer.p.tzdata is strat.data_feed: strat.notify_timer(timer, when, *args, **kwargs) def notify_order(self, order): for strat in self.strategies: if order.data is strat.data_feed: strat.notify_order(order) def notify_trade(self, trade): for strat in self.strategies: strat.notify_trade(trade) def prenext(self): for strat in self.strategies: strat.prenext() def next(self): for strat in self.strategies: strat.next()
So from the Cerebro side it just looks like a single strategy with multiple datas case.
-
@vladisld Thank you. It's a great idea!
BTW, why doesn't numpylines merge back to master branch. Does numpylines helps to improve backtrader performance?
-
@yacc2000 said in Multicore optimization not work for parallel multistrategy scenario:
why doesn't numpylines merge back to master branch
don't know - I'm not the author of the framework. What benefit will this bring, other than adding more communication complexity?
@yacc2000 said in Multicore optimization not work for parallel multistrategy scenario:
Does numpylines helps to improve backtrader performance?
I'm not sure the numpy library is even used in Backtrader - it is used in TALib indicators though. IIRC there was an attempt to utilize the numpy arrays in Backtrader in some feature branch, probably abandoned now.
-
@vladisld Thanks!