Multiple data feed with different start and end time
-
Hi:
So I tried to feed the strategy with multiple data(stocks) with different start and end date(which is a nature of stock market I believe as on each date some went IPO and some got delisted), then:
1\Run the strategy from earliest possible date(the earliest start date of all data)
2\Select n stocks based on an extra line (e.g. p/e) on any date
3\rebalance the portfolio with each selected stock consist of 1/n of the portfolioSo as a first attempt I try to run my strategy with Strategy.Next(with the knowledge this only run over overlapping periods) . For analogy purpose, two Pandas Dataframe with different start and end date were feed to the cerebro:
(data0 is from 2017/03 to 2018/01
data1 is from 2017/06 to 2018/05)data0 = bt.feeds.PandasData(dataname=df0) data1 = bt.feeds.PandasData(dataname=df1) cerebro.adddata(data0) cerebro.adddata(data1) class St(bt.Strategy): params = {"printout":True} def __init__(self): def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: if order.status in [order.Completed]: if order.isbuy(): self.log( 'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.buyprice = order.executed.price self.buycomm = order.executed.comm else: # Sell self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.bar_executed = len(self) def log(self, txt, dt=None): if self.p.printout: dt = dt or self.data.datetime[0] dt = bt.num2date(dt) print('%s, %s' % (dt.isoformat(), txt)) def next(self): if self.p.printout: print('Data0 dt:', self.data0.datetime.datetime()) print('Data1 dt:', self.data1.datetime.datetime()) for d in self.getdatanames(): pos = self.getpositionbyname(d).size or 0 if pos == 0: # No position self.log('BUY CREATE, %.2f' % self.getdatabyname(d).close[0]) self.order = self.buy(data=self.getdatabyname(d)) else: # Open position self.log('SELL CREATE, %.2f' % self.getdatabyname(d).close[0]) self.order = self.sell(data=self.getdatabyname(d)) cerebro.run(oldsync=False) # Plot the result cerebro.plot(iplot=True, volume=False)
Result seems really weird to me:
All the buy and sell marker for data1 shifts to the right by an amount (I believe) = start date of data1 - startdate of data0
Part of the log:
Starting Portfolio Value: 1000000000.00 2017-06-19T00:00:00, BUY CREATE, 956.44 2017-06-19T00:00:00, BUY CREATE, 9.82 2017-06-20T00:00:00, BUY EXECUTED, Price: 955.40, Cost: 955.40, Comm 0.96 2017-06-20T00:00:00, BUY EXECUTED, Price: 10.80, Cost: 10.80, Comm 0.01 2017-06-20T00:00:00, SELL CREATE, 955.40 2017-06-20T00:00:00, SELL CREATE, 10.80 2017-06-21T00:00:00, SELL EXECUTED, Price: 960.63, Cost: 955.40, Comm 0.96 2017-06-21T00:00:00, SELL EXECUTED, Price: 11.88, Cost: 10.80, Comm 0.01 ...... 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.67 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.65, Cost: 12.65, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.73 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.75, Cost: 12.65, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.55 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.51, Cost: 12.51, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.62 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.51, Cost: 12.51, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.20 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.26, Cost: 12.26, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.58 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.58, Cost: 12.26, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.77 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.61, Cost: 12.61, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.69 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.61, Cost: 12.61, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.58 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.50, Cost: 12.50, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.53 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.14, Cost: 12.50, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 11.29 2018-01-05T00:00:00, BUY EXECUTED, Price: 11.25, Cost: 11.25, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 11.53 2018-01-05T00:00:00, SELL EXECUTED, Price: 11.69, Cost: 11.25, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 11.77 2018-01-05T00:00:00, BUY EXECUTED, Price: 11.60, Cost: 11.60, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 11.89 2018-01-05T00:00:00, SELL EXECUTED, Price: 11.99, Cost: 11.60, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.26 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.36, Cost: 12.36, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.48 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.55, Cost: 12.36, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.89 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.63, Cost: 12.63, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.67 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.68, Cost: 12.63, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 12.45 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.32, Cost: 12.32, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 12.86 2018-01-05T00:00:00, SELL EXECUTED, Price: 12.86, Cost: 12.32, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 13.06 2018-01-05T00:00:00, BUY EXECUTED, Price: 12.99, Cost: 12.99, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 13.54 2018-01-05T00:00:00, SELL EXECUTED, Price: 13.41, Cost: 12.99, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, BUY CREATE, 13.22 2018-01-05T00:00:00, BUY EXECUTED, Price: 13.34, Cost: 13.34, Comm 0.01 2018-01-05T00:00:00, BUY CREATE, 1413.91 2018-01-05T00:00:00, SELL CREATE, 13.06
It seems a lot of trade were executed on last date of data0
Even if I set both datafeed with same end date, the "buy-sell marker shifting" still exists.
Any hints why this happend, is this an issue of plotting(broker value stays the same after overlapping period) or actual trading?
Further Questions:
To achieve "run the strategy from earliest possible date to latest possible date", I plan to use prenext( while checking the length of each data with len(self.data) ) for period before overlapping, followed by Next for overlapping periods, but it doesn't seem there is "postnext" method for period after overlapping, is there a better approaches?
-
@yi-fang said in Multiple data feed with different start and end time:
All the buy and sell marker for data1 shifts to the right by an amount (I believe) = start date of data1 - startdate of data0
No. But you have redacted your code and removed anything you were doing during
__init__
and with it you have removed any useful information to give you a diagnostic.@yi-fang said in Multiple data feed with different start and end time:
It seems a lot of trade were executed on last date of data0
Check again how you call the
log
method and you will see why.@yi-fang said in Multiple data feed with different start and end time:
is there a better approaches?
No.
-
Thank you for the timely reply, I will analyze the code accordingly! @backtrader
-
Following is my full script for diagnose, nothing much in
__init__
@backtradermysql_cleaner
is a function to retrieve data from my personal mysql server, it returns apandas.Dataframe
following the rules set bybt.feeds.PandasData
with an extra column named 'factor',PandasData_Factor
is just a subclass which added the 'factor' line. This extra line is only for future use and I dont think it caused my problem in the result plot.import backtrader as bt import datetime import pandas as pd from dateutil.parser import parse from sqlalchemy import create_engine # Create a Stratey class St(bt.Strategy): def __init__(self): self.order = None pass def next(self): for d in self.getdatanames(): # Simple strategy: if there is no position for a given data, balance the position to 20%, # else balance to 10% # So every asset should be bought to 20% on day one and remain near 10% for the rest of bars pos = self.getpositionbyname(d).size or 0 if pos == 0: # No position self.order = self.order_target_percent(target=0.2,data=self.getdatabyname(d)) else: # We have an open position self.order = self.order_target_percent(target=0.1,data=self.getdatabyname(d)) class PandasData_Factor(bt.feeds.PandasData): # Add a 'factor' line to the inherited ones from the base class lines = ('factor',) # add the parameter to the parameters inherited from the base class params = (('factor', -1),) datafields = bt.feeds.PandasData.datafields + (['factor']) def mysql_cleaner(stkcode,startdate,enddate,extrafactor='amount'): startdateord = parse(str(startdate)).toordinal() enddateord = parse(str(enddate)).toordinal() df = pd.read_sql('SELECT tradedt,open,high,low,close,volume,adjfactor,' + extrafactor + ' FROM dailytrading ' + 'WHERE stkcode = "' + stkcode + '"' 'AND tradedt <= "' + str((enddateord)) + '" AND tradedt >= "' + str((startdateord)) + '"', conn) df['open'] = df['open'] * df['adjfactor'] df['high'] = df['high'] * df['adjfactor'] df['low'] = df['low'] * df['adjfactor'] df['close'] = df['close'] * df['adjfactor'] df.drop('adjfactor', inplace=True, axis=1) df.sort_values(by=['tradedt'], inplace=True) df['tradedt'] = df['tradedt'].apply(lambda x: datetime.datetime.fromordinal(x)) df.set_index('tradedt', inplace=True) df.index.name = 'Date' df.insert(5, 'openinterest', 0) df.columns.values[-1] = 'factor' return df if __name__ == '__main__': startcash = 1000000 extrafactor = 'amount' # Create a connection to Mysql server conn = create_engine('mysql+pymysql://***:***@***:***/***?charset=utf8') # Create a cerebro entity cerebro = bt.Cerebro() # Defaut plot objects are not needed data0 = mysql_cleaner('000001.SZ',20170206,20180105) data0 = PandasData_Factor(dataname=data0) cerebro.adddata(data0,name = 'data0') data1 = mysql_cleaner('603316.SH',20170619,20180505) data1 = PandasData_Factor(dataname=data1) cerebro.adddata(data1,name = 'data1') # Set our desired cash start cerebro.broker.setcash(startcash) cerebro.broker.setcommission(commission=0.001) # Add a strategy cerebro.addstrategy(St) # Print out the starting conditions print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue()) # Run over everything cerebro.run() # Get final portfolio Value portvalue = cerebro.broker.getvalue() pnl = portvalue - startcash print('Final Portfolio Value: ${}'.format(portvalue)) print('P/L: ${}'.format(round(pnl,2))) # Plot the result cerebro.plot(iplot=True, volume=False)
The resulting buy and sell markers are still "shifted":
Two questions:
1\Why are my buy and sell markers shifted for data1?
2\Current data length islen(self.getdatabyname(ticker))
, is there a way to access previous data length, just like we access previous close value by using[-1]
[-2]
etc? If not, is there a way to access total data length?Thanks in advance for any reply!
-
yi-fang said in Multiple data feed with different start and end time:
1\Why are my buy and sell markers shifted for data1?
The real problem as in the previous post: the code you have posted has NOT produced the posted chart.
It is therefore in your hand to answer the question as to why your charts are produced by some other code.
@yi-fang said in Multiple data feed with different start and end time:
2\Current data length is len(self.getdatabyname(ticker)), is there a way to access previous data length, just like we access previous close value by using [-1] [-2] etc? If not, is there a way to access total data length?
Keep the length in a dictionary. You cannot also access the previous length of a list in Python.
-
Thank you for your reply!
@backtrader said in Multiple data feed with different start and end time:
The real problem as in the previous post: the code you have posted has NOT produced the posted chart.
I copy and paste my code directly from pycharm with only one modification: hide mysql credentials with
***
, could it be the problem of my data? Attached are the pandas.Dataframe from pycharm for data0 and data1
-
@yi-fang said in Multiple data feed with different start and end time:
I copy and paste my code directly from pycharm with only one modification: hide mysql credentials with *** , could it be the problem of my data?
No you don't. This is not because I think you lie. Keeping track of code, open windows and this and that and that other is sometimes difficult. But see:
-
In your first post you removed everything from the
Strategy.__init__
and the code was not syntactically valid. -
In your second post you show a chart which has two volume plots and the code contains the following:
cerebro.plot(iplot=True, volume=False)
It seems obvious that the posted code is not the code producing your problem, or else it wouldn't contain volume plots. Even better ... a chart plotted inside
iPython
(iplot=True
) will not have the Windows title bar and Windows maximize, minimize and close buttons.One of the major problems when reporting something is that we all try to work the least minimum possible and you try small modifications of your complex code to try to show where the problem may be.
But one of the most important things when reporting something is to reproduce it with a short, clean and legible sample, that has actually been used to reproduce what you want to report.
-
-
@backtrader Thank you again for your reply! I tried to reproduce my problem with a short, clean and legible sample this time(made sure several times that this is indeed the code, I am running windows + anaconda + pycharm,
bt.__version__
=='1.9.64.122'
,matplotlib.__version__
=='2.1.2'
if this matters).Original code:
import backtrader as bt import datetime as dt import pandas as pd from dateutil.parser import parse from sqlalchemy import create_engine class St(bt.Strategy): def next(self): for d in self.getdatanames(): if self.dnames[d].close[0] > 40: self.buy(data = self.dnames[d],size = 100) print(d,self.getpositionbyname(d)) def mysql_cleaner(stkcode,startdate,enddate): startdateord = parse(str(startdate)).toordinal() enddateord = parse(str(enddate)).toordinal() df = pd.read_sql('SELECT tradedt,open,high,low,close,volume,adjfactor' ' FROM dailytrading ' + 'WHERE stkcode = "' + stkcode + '"' 'AND tradedt <= "' + str((enddateord)) + '" AND tradedt >= "' + str((startdateord)) + '"', conn) df['open'] = df['open'] * df['adjfactor'] df['high'] = df['high'] * df['adjfactor'] df['low'] = df['low'] * df['adjfactor'] df['close'] = df['close'] * df['adjfactor'] df['volume'] = df['volume'] / df['adjfactor'] df.drop('adjfactor', inplace=True, axis=1) df.sort_values(by=['tradedt'], inplace=True) df['tradedt'] = df['tradedt'].apply(lambda x: dt.datetime.fromordinal(x)) df.set_index('tradedt', inplace=True) df.index.name = 'Date' df.insert(5, 'openinterest', 0) return df if __name__ == '__main__': conn = create_engine('mysql+pymysql://***:***@***:***/***?charset=utf8') cerebro = bt.Cerebro() data0 = mysql_cleaner('000001.SZ',startdate = 20170101,enddate = 20171231) data0 = bt.feeds.PandasData(dataname=data0) cerebro.adddata(data0,name = 'data0') data1 = mysql_cleaner('000002.SZ',startdate = 20170530,enddate = 20180515) data1 = bt.feeds.PandasData(dataname=data1) cerebro.adddata(data1,name = 'data1') cerebro.broker.setcash(100000000) cerebro.addstrategy(St) cerebro.run() cerebro.plot()
Once again I hide mysql credential with ***.
mysql_cleaner
returns a dataframe with a shape that satisfiesbt.feeds.PandasData
.As the code and plot suggested:
data0: startdate = 20170101, enddate = 20171231
data1: startdate = 20170530, enddate = 20180515The above code produces this:
If I swap the the data feed order with:
data1 = mysql_cleaner('000002.SZ',startdate = 20170530,enddate = 20180515) data1 = bt.feeds.PandasData(dataname=data1) cerebro.adddata(data1,name = 'data1') data0 = mysql_cleaner('000001.SZ',startdate = 20170101,enddate = 20171231) data0 = bt.feeds.PandasData(dataname=data0) cerebro.adddata(data0,name = 'data0')
The plot became:
The printed outcome (by the line
print(d,self.getpositionbyname(d))
in original code) for first two days are(data feed not swapped):data0 --- Position Begin - Size: 0 - Price: 0.0 - Price orig: 0.0 - Closed: 0 - Opened: 0 - Adjbase: None --- Position End data1 --- Position Begin - Size: 0 - Price: 0.0 - Price orig: 0.0 - Closed: 0 - Opened: 0 - Adjbase: None --- Position End data0 --- Position Begin - Size: 100 - Price: 963.7759275999999 - Price orig: 0.0 - Closed: 0 - Opened: 100 - Adjbase: 962.7283450699999 --- Position End data1 --- Position Begin - Size: 100 - Price: 2801.7101380999998 - Price orig: 0.0 - Closed: 0 - Opened: 100 - Adjbase: 2800.3823133899996 --- Position End
In day 2 the price for both data feed seems correct, so I suspect only the markers are shifted, but actual trade wasn's wrong.
One more question which is not so related, is it possible to 'extend' a data feed by removing a line, say the openinterest line, I read Extending a Datafeed but doesnt seem it fits my question. The reason I am asking this is that I never used this line, but I am feeding the cerebro with 3000ish datas, the initialization is really slow, removing a line may speed things up.
-
@backtrader any suggestions?
-
@Yi-Fang I have the same problem as you , have you gotten the solution?