For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Multiple data feed with different start and end time



  • Hi:

    So I tried to feed the strategy with multiple data(stocks) with different start and end date(which is a nature of stock market I believe as on each date some went IPO and some got delisted), then:

    1\Run the strategy from earliest possible date(the earliest start date of all data)
    2\Select n stocks based on an extra line (e.g. p/e) on any date
    3\rebalance the portfolio with each selected stock consist of 1/n of the portfolio

    So as a first attempt I try to run my strategy with Strategy.Next(with the knowledge this only run over overlapping periods) . For analogy purpose, two Pandas Dataframe with different start and end date were feed to the cerebro:

    (data0 is from 2017/03 to 2018/01
    data1 is from 2017/06 to 2018/05)

    data0 = bt.feeds.PandasData(dataname=df0)
    data1 = bt.feeds.PandasData(dataname=df1)
    
    cerebro.adddata(data0)
    cerebro.adddata(data1) 
    
    class St(bt.Strategy):
        params = {"printout":True}
    
        def __init__(self):
    
        def notify_order(self, order):
            if order.status in [order.Submitted, order.Accepted]:
    
            if order.status in [order.Completed]:
                if order.isbuy():
                    self.log(
                        'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                        (order.executed.price,
                         order.executed.value,
                         order.executed.comm))
    
                    self.buyprice = order.executed.price
                    self.buycomm = order.executed.comm
                else:  # Sell
                    self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                             (order.executed.price,
                              order.executed.value,
                              order.executed.comm))
    
                self.bar_executed = len(self)
        def log(self, txt, dt=None):
            if self.p.printout:
                dt = dt or self.data.datetime[0]
                dt = bt.num2date(dt)
                print('%s, %s' % (dt.isoformat(), txt))
    
        def next(self):
            if self.p.printout:
                print('Data0 dt:', self.data0.datetime.datetime())
                print('Data1 dt:', self.data1.datetime.datetime())
            for d in self.getdatanames():
                pos = self.getpositionbyname(d).size or 0
                if pos == 0: # No position
                    self.log('BUY CREATE, %.2f' % self.getdatabyname(d).close[0])
                    self.order = self.buy(data=self.getdatabyname(d))
    
                else:  # Open position
                    self.log('SELL CREATE, %.2f' % self.getdatabyname(d).close[0])
                    self.order = self.sell(data=self.getdatabyname(d))
    
    cerebro.run(oldsync=False)
    
    # Plot the result
    cerebro.plot(iplot=True, volume=False)
    
    

    Result seems really weird to me:

    0_1526618978749_Figure_0.png

    All the buy and sell marker for data1 shifts to the right by an amount (I believe) = start date of data1 - startdate of data0

    Part of the log:

    Starting Portfolio Value: 1000000000.00
    2017-06-19T00:00:00, BUY CREATE, 956.44
    2017-06-19T00:00:00, BUY CREATE, 9.82
    2017-06-20T00:00:00, BUY EXECUTED, Price: 955.40, Cost: 955.40, Comm 0.96
    2017-06-20T00:00:00, BUY EXECUTED, Price: 10.80, Cost: 10.80, Comm 0.01
    2017-06-20T00:00:00, SELL CREATE, 955.40
    2017-06-20T00:00:00, SELL CREATE, 10.80
    2017-06-21T00:00:00, SELL EXECUTED, Price: 960.63, Cost: 955.40, Comm 0.96
    2017-06-21T00:00:00, SELL EXECUTED, Price: 11.88, Cost: 10.80, Comm 0.01
    ......
    
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.67
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.65, Cost: 12.65, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.73
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.75, Cost: 12.65, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.55
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.51, Cost: 12.51, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.62
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.51, Cost: 12.51, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.20
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.26, Cost: 12.26, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.58
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.58, Cost: 12.26, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.77
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.61, Cost: 12.61, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.69
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.61, Cost: 12.61, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.58
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.50, Cost: 12.50, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.53
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.14, Cost: 12.50, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 11.29
    2018-01-05T00:00:00, BUY EXECUTED, Price: 11.25, Cost: 11.25, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 11.53
    2018-01-05T00:00:00, SELL EXECUTED, Price: 11.69, Cost: 11.25, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 11.77
    2018-01-05T00:00:00, BUY EXECUTED, Price: 11.60, Cost: 11.60, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 11.89
    2018-01-05T00:00:00, SELL EXECUTED, Price: 11.99, Cost: 11.60, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.26
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.36, Cost: 12.36, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.48
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.55, Cost: 12.36, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.89
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.63, Cost: 12.63, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.67
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.68, Cost: 12.63, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 12.45
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.32, Cost: 12.32, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 12.86
    2018-01-05T00:00:00, SELL EXECUTED, Price: 12.86, Cost: 12.32, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 13.06
    2018-01-05T00:00:00, BUY EXECUTED, Price: 12.99, Cost: 12.99, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 13.54
    2018-01-05T00:00:00, SELL EXECUTED, Price: 13.41, Cost: 12.99, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, BUY CREATE, 13.22
    2018-01-05T00:00:00, BUY EXECUTED, Price: 13.34, Cost: 13.34, Comm 0.01
    2018-01-05T00:00:00, BUY CREATE, 1413.91
    2018-01-05T00:00:00, SELL CREATE, 13.06
    

    It seems a lot of trade were executed on last date of data0

    Even if I set both datafeed with same end date, the "buy-sell marker shifting" still exists.

    Any hints why this happend, is this an issue of plotting(broker value stays the same after overlapping period) or actual trading?

    Further Questions:

    To achieve "run the strategy from earliest possible date to latest possible date", I plan to use prenext( while checking the length of each data with len(self.data) ) for period before overlapping, followed by Next for overlapping periods, but it doesn't seem there is "postnext" method for period after overlapping, is there a better approaches?


  • administrators

    @yi-fang said in Multiple data feed with different start and end time:

    All the buy and sell marker for data1 shifts to the right by an amount (I believe) = start date of data1 - startdate of data0

    No. But you have redacted your code and removed anything you were doing during __init__ and with it you have removed any useful information to give you a diagnostic.

    @yi-fang said in Multiple data feed with different start and end time:

    It seems a lot of trade were executed on last date of data0

    Check again how you call the log method and you will see why.

    @yi-fang said in Multiple data feed with different start and end time:

    is there a better approaches?

    No.



  • Thank you for the timely reply, I will analyze the code accordingly! @backtrader



  • Following is my full script for diagnose, nothing much in __init__ @backtrader

    mysql_cleaner is a function to retrieve data from my personal mysql server, it returns a pandas.Dataframe following the rules set by bt.feeds.PandasData with an extra column named 'factor', PandasData_Factor is just a subclass which added the 'factor' line. This extra line is only for future use and I dont think it caused my problem in the result plot.

    import backtrader as bt
    import datetime
    import pandas as pd
    from dateutil.parser import parse
    from sqlalchemy import create_engine
    
    # Create a Stratey
    class St(bt.Strategy):
    
        def __init__(self):
            self.order = None
            pass
    
        def next(self):
            for d in self.getdatanames():
                # Simple strategy: if there is no position for a given data, balance the position to 20%,
                # else balance to 10%
                # So every asset should be bought to 20% on day one and remain near 10% for the rest of bars
                pos = self.getpositionbyname(d).size or 0
                if pos == 0: # No position
                    self.order = self.order_target_percent(target=0.2,data=self.getdatabyname(d))
    
                else:  # We have an open position
                    self.order = self.order_target_percent(target=0.1,data=self.getdatabyname(d))
    
    class PandasData_Factor(bt.feeds.PandasData):
    
        # Add a 'factor' line to the inherited ones from the base class
        lines = ('factor',)
    
        # add the parameter to the parameters inherited from the base class
        params = (('factor', -1),)
    
        datafields = bt.feeds.PandasData.datafields + (['factor'])
    def mysql_cleaner(stkcode,startdate,enddate,extrafactor='amount'):
    
        startdateord = parse(str(startdate)).toordinal()
        enddateord = parse(str(enddate)).toordinal()
    
        df = pd.read_sql('SELECT tradedt,open,high,low,close,volume,adjfactor,' + extrafactor +
                         ' FROM dailytrading ' +
                         'WHERE stkcode = "' + stkcode + '"'
                          'AND tradedt <= "' + str((enddateord)) +
                         '" AND tradedt >= "' + str((startdateord)) +
                         '"', conn)
    
        df['open'] = df['open'] * df['adjfactor']
        df['high'] = df['high'] * df['adjfactor']
        df['low'] = df['low'] * df['adjfactor']
        df['close'] = df['close'] * df['adjfactor']
        df.drop('adjfactor', inplace=True, axis=1)
    
        df.sort_values(by=['tradedt'], inplace=True)
        df['tradedt'] = df['tradedt'].apply(lambda x: datetime.datetime.fromordinal(x))
    
        df.set_index('tradedt', inplace=True)
        df.index.name = 'Date'
        df.insert(5, 'openinterest', 0)
        df.columns.values[-1] = 'factor'
    
        return df
    
    
    if __name__ == '__main__':
    
        startcash = 1000000
        extrafactor = 'amount'
    
        # Create a connection to Mysql server
        conn = create_engine('mysql+pymysql://***:***@***:***/***?charset=utf8')
    
        # Create a cerebro entity
        cerebro = bt.Cerebro()  # Defaut plot objects are not needed
    
        data0 = mysql_cleaner('000001.SZ',20170206,20180105)
        data0 = PandasData_Factor(dataname=data0)
        cerebro.adddata(data0,name = 'data0')
    
        data1 = mysql_cleaner('603316.SH',20170619,20180505)
        data1 = PandasData_Factor(dataname=data1)
        cerebro.adddata(data1,name = 'data1')
    
        # Set our desired cash start
        cerebro.broker.setcash(startcash)
        cerebro.broker.setcommission(commission=0.001)
    
        # Add a strategy
        cerebro.addstrategy(St)
    
        # Print out the starting conditions
        print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue())
    
        # Run over everything
        cerebro.run()
    
        # Get final portfolio Value
        portvalue = cerebro.broker.getvalue()
        pnl = portvalue - startcash
    
        print('Final Portfolio Value: ${}'.format(portvalue))
        print('P/L: ${}'.format(round(pnl,2)))
    
        # Plot the result
        cerebro.plot(iplot=True, volume=False)
    

    The resulting buy and sell markers are still "shifted":
    0_1526648928168_1526648916(1).jpg

    Two questions:
    1\Why are my buy and sell markers shifted for data1?
    2\Current data length is len(self.getdatabyname(ticker)), is there a way to access previous data length, just like we access previous close value by using [-1] [-2] etc? If not, is there a way to access total data length?

    Thanks in advance for any reply!


  • administrators

    yi-fang said in Multiple data feed with different start and end time:

    1\Why are my buy and sell markers shifted for data1?

    The real problem as in the previous post: the code you have posted has NOT produced the posted chart.

    It is therefore in your hand to answer the question as to why your charts are produced by some other code.

    @yi-fang said in Multiple data feed with different start and end time:

    2\Current data length is len(self.getdatabyname(ticker)), is there a way to access previous data length, just like we access previous close value by using [-1] [-2] etc? If not, is there a way to access total data length?

    Keep the length in a dictionary. You cannot also access the previous length of a list in Python.



  • Thank you for your reply!

    @backtrader said in Multiple data feed with different start and end time:

    The real problem as in the previous post: the code you have posted has NOT produced the posted chart.

    I copy and paste my code directly from pycharm with only one modification: hide mysql credentials with *** , could it be the problem of my data? Attached are the pandas.Dataframe from pycharm for data0 and data1
    0_1526894694588_1526894543(1).jpg

    0_1526894700200_1526894672(1).jpg


  • administrators

    @yi-fang said in Multiple data feed with different start and end time:

    I copy and paste my code directly from pycharm with only one modification: hide mysql credentials with *** , could it be the problem of my data?

    No you don't. This is not because I think you lie. Keeping track of code, open windows and this and that and that other is sometimes difficult. But see:

    • In your first post you removed everything from the Strategy.__init__ and the code was not syntactically valid.

    • In your second post you show a chart which has two volume plots and the code contains the following:

      cerebro.plot(iplot=True, volume=False)
      

    It seems obvious that the posted code is not the code producing your problem, or else it wouldn't contain volume plots. Even better ... a chart plotted inside iPython (iplot=True) will not have the Windows title bar and Windows maximize, minimize and close buttons.

    One of the major problems when reporting something is that we all try to work the least minimum possible and you try small modifications of your complex code to try to show where the problem may be.

    But one of the most important things when reporting something is to reproduce it with a short, clean and legible sample, that has actually been used to reproduce what you want to report.



  • @backtrader Thank you again for your reply! I tried to reproduce my problem with a short, clean and legible sample this time(made sure several times that this is indeed the code, I am running windows + anaconda + pycharm, bt.__version__ == '1.9.64.122', matplotlib.__version__ == '2.1.2' if this matters).

    Original code:

    import backtrader as bt
    import datetime as dt
    import pandas as pd
    from dateutil.parser import parse
    from sqlalchemy import create_engine
    
    class St(bt.Strategy):
        def next(self):
            for d in self.getdatanames():
                if self.dnames[d].close[0] > 40:
                    self.buy(data = self.dnames[d],size = 100)
                    print(d,self.getpositionbyname(d))
    
    def mysql_cleaner(stkcode,startdate,enddate):
    
        startdateord = parse(str(startdate)).toordinal()
        enddateord = parse(str(enddate)).toordinal()
    
    
        df = pd.read_sql('SELECT tradedt,open,high,low,close,volume,adjfactor'
                             ' FROM dailytrading ' +
                             'WHERE stkcode = "' + stkcode + '"'
                              'AND tradedt <= "' + str((enddateord)) +
                             '" AND tradedt >= "' + str((startdateord)) +
                             '"', conn)
    
        df['open'] = df['open'] * df['adjfactor']
        df['high'] = df['high'] * df['adjfactor']
        df['low'] = df['low'] * df['adjfactor']
        df['close'] = df['close'] * df['adjfactor']
        df['volume'] = df['volume'] / df['adjfactor']
        df.drop('adjfactor', inplace=True, axis=1)
        df.sort_values(by=['tradedt'], inplace=True)
        df['tradedt'] = df['tradedt'].apply(lambda x: dt.datetime.fromordinal(x))
        df.set_index('tradedt', inplace=True)
        df.index.name = 'Date'
        df.insert(5, 'openinterest', 0)
    
        return df
    
    if __name__ == '__main__':
    
        conn = create_engine('mysql+pymysql://***:***@***:***/***?charset=utf8')
        cerebro = bt.Cerebro()
    
        data0 = mysql_cleaner('000001.SZ',startdate = 20170101,enddate = 20171231)
        data0 = bt.feeds.PandasData(dataname=data0)
        cerebro.adddata(data0,name = 'data0')
    
        data1 = mysql_cleaner('000002.SZ',startdate = 20170530,enddate = 20180515)
        data1 = bt.feeds.PandasData(dataname=data1)
        cerebro.adddata(data1,name = 'data1')
    
    
    
        cerebro.broker.setcash(100000000)
    
        cerebro.addstrategy(St)
    
        cerebro.run()
        cerebro.plot()
    

    Once again I hide mysql credential with ***. mysql_cleaner returns a dataframe with a shape that satisfies bt.feeds.PandasData.

    As the code and plot suggested:
    data0: startdate = 20170101, enddate = 20171231
    data1: startdate = 20170530, enddate = 20180515

    The above code produces this:
    0_1526959128976_Figure_0.png

    If I swap the the data feed order with:

        data1 = mysql_cleaner('000002.SZ',startdate = 20170530,enddate = 20180515)
        data1 = bt.feeds.PandasData(dataname=data1)
        cerebro.adddata(data1,name = 'data1')
    
        data0 = mysql_cleaner('000001.SZ',startdate = 20170101,enddate = 20171231)
        data0 = bt.feeds.PandasData(dataname=data0)
        cerebro.adddata(data0,name = 'data0')
    

    The plot became:
    0_1526959341939_1.png

    The printed outcome (by the line print(d,self.getpositionbyname(d)) in original code) for first two days are(data feed not swapped):

    data0 --- Position Begin
    - Size: 0
    - Price: 0.0
    - Price orig: 0.0
    - Closed: 0
    - Opened: 0
    - Adjbase: None
    --- Position End
    data1 --- Position Begin
    - Size: 0
    - Price: 0.0
    - Price orig: 0.0
    - Closed: 0
    - Opened: 0
    - Adjbase: None
    --- Position End
    data0 --- Position Begin
    - Size: 100
    - Price: 963.7759275999999
    - Price orig: 0.0
    - Closed: 0
    - Opened: 100
    - Adjbase: 962.7283450699999
    --- Position End
    data1 --- Position Begin
    - Size: 100
    - Price: 2801.7101380999998
    - Price orig: 0.0
    - Closed: 0
    - Opened: 100
    - Adjbase: 2800.3823133899996
    --- Position End
    

    In day 2 the price for both data feed seems correct, so I suspect only the markers are shifted, but actual trade wasn's wrong.

    One more question which is not so related, is it possible to 'extend' a data feed by removing a line, say the openinterest line, I read Extending a Datafeed but doesnt seem it fits my question. The reason I am asking this is that I never used this line, but I am feeding the cerebro with 3000ish datas, the initialization is really slow, removing a line may speed things up.



  • @backtrader any suggestions?