For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

cerebro.resampledata() vs pandas .resample()



  • I'm trying to figure out what way is the proper way of aggregating the data.
    I have some minute data from the market and I want to do some daily TA, i've tried two approaches that gives me a bit different final output.

    First attempt using the resampledata() method:

    exampleDataWithoutPreResample = coinBaseUSD.loc['2015-02-01':'2015-03-14'][['Open','High','Low','Close','Volume']]
    data = bt.feeds.PandasData(dataname=exampleDataWithoutPreResample, timeframe=bt.TimeFrame.Minutes, openinterest=None)
    foo = cerebro.resampledata(data, timeframe=bt.TimeFrame.Days)
    

    Logs that i print

    2015-03-13, Sell created, 287.57
    2015-03-14, SELL EXECUTED, Price: 295.74, Cost: 4350.45, Comm 44.36 << ??
    2015-03-14, OPERATION PROFIT, GROSS 85.65, NET -2.22
    2015-03-14, (MA Period  6) Ending Value 99773.58
    

    The second approach with pre aggregation in Pandas:

    ohlc_dict = {                                                                                                             
        'Open':'first',
        'High':'max',
        'Low':'min',
        'Close': 'last',
        'Volume': 'sum'
    }
    exampleData = coinBaseUSD.loc['2015-02-01':'2015-03-14'][['Open','High','Low','Close','Volume']].resample('D', how=ohlc_dict)
    data = bt.feeds.PandasData(dataname=exampleData)
    foo2 = cerebro.adddata(data)
    

    Everything match to the last Sell execution

    2015-03-13, Sell created, 287.57
    2015-03-14, SELL EXECUTED, Price: 287.65, Cost: 4350.45, Comm 43.15 << ?
    2015-03-14, OPERATION PROFIT, GROSS -35.70, NET -122.35
    2015-03-14, (MA Period  6) Ending Value 99653.44
    
    >> print(foo.getwritervalues(), '\n',foo2.getwritervalues())
    ['', 42, datetime.datetime(2015, 3, 14, 23, 59, 59, 999989), 287.65, 289.0, 280.86, 284.83, 7898.626718840009, nan] 
     ['', 42, datetime.datetime(2015, 3, 14, 0, 0), 287.65, 289.0, 280.86, 284.83, 7898.626718840009, nan]
    
    exampleData.tail(5)
    Timestamp       Open	High	Low	Close	Volume
    2015-03-12	296.83	298.90	291.87	295.91	9698.128688
    2015-03-13	295.74	296.52	284.86	287.57	11889.603667
    2015-03-14	287.65	289.00	280.86	284.83	7898.626719
    

    The difference is with in the SELL EXECUTED, Price: xxx
    First example takes the open value from 2015-03-13 and the second example takes the open value from 2015-03-14
    Ah and why is that the resampledata is way slower when running

    logic behind sell execution:

    def next(self):
        # ....
        if self.dataclose[0] < self.sma[0]:
            self.log("Sell created, %.2f" % self.dataclose[0])
    
    def notify_order():
        # ....
        if order.issell():
            self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                        (order.executed.price,
                        order.executed.value,
                        order.executed.comm))
    

  • administrators

    The following would actually help:

    • print the actual datetime and open, close prices in your next.

    Additionally

    @qwert666 said in cerebro.resampledata() vs pandas .resample():

    def notify_order():
        # ....
        if order.issell():
            self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                        (order.executed.price,
                        order.executed.value,
                        order.executed.comm))
    

    Should one assume that you are checking the actual notification status before printing? (Accepted, Completed)

    Of course it is also assumed that both things are running with the same parameters and not one with cheat-on-close (or similar approach) and not the other.



  • @backtrader said in cerebro.resampledata() vs pandas .resample():

    datetime and open, close

    Yes both are running with the same parameters i've paste only the part that seemed relevant to solve the problem

    if self.dataclose[0] < self.sma[0]:
        self.log("DEBUG %s, %.2f" % (bt.num2date(self.datetime[0]).isoformat(), self.dataopen[0]))
        self.log("Sell created, %.2f" % self.dataclose[0])
    
    2015-03-13, DEBUG 2015-03-13T23:59:59.999989, 295.74
    2015-03-13, Sell created, 287.57
    2015-03-14, SELL EXECUTED, Price: 295.74, Cost: 4350.45, Comm 44.36
    2015-03-14, OPERATION PROFIT, GROSS 85.65, NET -2.22
    2015-03-14, (MA Period  6) Ending Value 99773.58
    

    and the second approach with Pandas

    2015-03-13, DEBUG 2015-03-13T00:00:00, 295.74
    2015-03-13, Sell created, 287.57
    2015-03-14, SELL EXECUTED, Price: 287.65, Cost: 4350.45, Comm 43.15
    2015-03-14, OPERATION PROFIT, GROSS -35.70, NET -122.35
    2015-03-14, (MA Period  6) Ending Value 99653.44
    

    .resample('D', how=ohlc_dict) cut the hours and the resampledata() leave it with 23:59 it's also visible in the values returned by getwritervalues could it be the reason why it behave in this way?


  • administrators

    Seeing how the data is actually resampled (what pandas gives you and what backtrader gives you) would seem relevant. Not only when buying and selling.

    In addition to that and since the differences you see seem to be in the transition from 2015-03-13 to 2015-03-14, it would be ideal to see how the 1-minute bars around the transition look like.

    backtrader gives you a resampled 1-day bar which is at the end of the day. The rationale behind:

    • If you mix that with a smaller timeframe, the 1-day bar is after the smaller timeframes.

    pandas is giving you a resampled data at the beginning of the day. Now you mix 2 timeframes and:

    • 2015-03-13 10:01:00 (1-minute timeframe) happens actually later than 2015-03-13 00:00:00 (1-day resampled by pandas)

    Which wouldn't really make sense.