cerebro.resampledata() vs pandas .resample()
-
I'm trying to figure out what way is the proper way of aggregating the data.
I have some minute data from the market and I want to do some daily TA, i've tried two approaches that gives me a bit different final output.First attempt using the resampledata() method:
exampleDataWithoutPreResample = coinBaseUSD.loc['2015-02-01':'2015-03-14'][['Open','High','Low','Close','Volume']] data = bt.feeds.PandasData(dataname=exampleDataWithoutPreResample, timeframe=bt.TimeFrame.Minutes, openinterest=None) foo = cerebro.resampledata(data, timeframe=bt.TimeFrame.Days)
Logs that i print
2015-03-13, Sell created, 287.57 2015-03-14, SELL EXECUTED, Price: 295.74, Cost: 4350.45, Comm 44.36 << ?? 2015-03-14, OPERATION PROFIT, GROSS 85.65, NET -2.22 2015-03-14, (MA Period 6) Ending Value 99773.58
The second approach with pre aggregation in Pandas:
ohlc_dict = { 'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last', 'Volume': 'sum' } exampleData = coinBaseUSD.loc['2015-02-01':'2015-03-14'][['Open','High','Low','Close','Volume']].resample('D', how=ohlc_dict) data = bt.feeds.PandasData(dataname=exampleData) foo2 = cerebro.adddata(data)
Everything match to the last Sell execution
2015-03-13, Sell created, 287.57 2015-03-14, SELL EXECUTED, Price: 287.65, Cost: 4350.45, Comm 43.15 << ? 2015-03-14, OPERATION PROFIT, GROSS -35.70, NET -122.35 2015-03-14, (MA Period 6) Ending Value 99653.44
>> print(foo.getwritervalues(), '\n',foo2.getwritervalues()) ['', 42, datetime.datetime(2015, 3, 14, 23, 59, 59, 999989), 287.65, 289.0, 280.86, 284.83, 7898.626718840009, nan] ['', 42, datetime.datetime(2015, 3, 14, 0, 0), 287.65, 289.0, 280.86, 284.83, 7898.626718840009, nan]
exampleData.tail(5) Timestamp Open High Low Close Volume 2015-03-12 296.83 298.90 291.87 295.91 9698.128688 2015-03-13 295.74 296.52 284.86 287.57 11889.603667 2015-03-14 287.65 289.00 280.86 284.83 7898.626719
The difference is with in the
SELL EXECUTED, Price: xxx
First example takes the open value from2015-03-13
and the second example takes the open value from2015-03-14
Ah and why is that the resampledata is way slower when runninglogic behind sell execution:
def next(self): # .... if self.dataclose[0] < self.sma[0]: self.log("Sell created, %.2f" % self.dataclose[0]) def notify_order(): # .... if order.issell(): self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm))
-
The following would actually help:
- print the actual
datetime
andopen
,close
prices in yournext
.
Additionally
@qwert666 said in cerebro.resampledata() vs pandas .resample():
def notify_order(): # .... if order.issell(): self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm))
Should one assume that you are checking the actual notification status before printing? (
Accepted
,Completed
)Of course it is also assumed that both things are running with the same parameters and not one with
cheat-on-close
(or similar approach) and not the other. - print the actual
-
@backtrader said in cerebro.resampledata() vs pandas .resample():
datetime and open, close
Yes both are running with the same parameters i've paste only the part that seemed relevant to solve the problem
if self.dataclose[0] < self.sma[0]: self.log("DEBUG %s, %.2f" % (bt.num2date(self.datetime[0]).isoformat(), self.dataopen[0])) self.log("Sell created, %.2f" % self.dataclose[0])
2015-03-13, DEBUG 2015-03-13T23:59:59.999989, 295.74 2015-03-13, Sell created, 287.57 2015-03-14, SELL EXECUTED, Price: 295.74, Cost: 4350.45, Comm 44.36 2015-03-14, OPERATION PROFIT, GROSS 85.65, NET -2.22 2015-03-14, (MA Period 6) Ending Value 99773.58
and the second approach with Pandas
2015-03-13, DEBUG 2015-03-13T00:00:00, 295.74 2015-03-13, Sell created, 287.57 2015-03-14, SELL EXECUTED, Price: 287.65, Cost: 4350.45, Comm 43.15 2015-03-14, OPERATION PROFIT, GROSS -35.70, NET -122.35 2015-03-14, (MA Period 6) Ending Value 99653.44
.resample('D', how=ohlc_dict)
cut the hours and theresampledata()
leave it with 23:59 it's also visible in the values returned bygetwritervalues
could it be the reason why it behave in this way? -
Seeing how the data is actually resampled (what pandas gives you and what backtrader gives you) would seem relevant. Not only when buying and selling.
In addition to that and since the differences you see seem to be in the transition from
2015-03-13
to2015-03-14
, it would be ideal to see how the1-minute
bars around the transition look like.backtrader
gives you a resampled1-day
bar which is at the end of the day. The rationale behind:- If you mix that with a smaller timeframe, the
1-day
bar is after the smaller timeframes.
pandas
is giving you a resampled data at the beginning of the day. Now you mix 2 timeframes and:2015-03-13 10:01:00
(1-minute
timeframe) happens actually later than2015-03-13 00:00:00
(1-day
resampled by pandas)
Which wouldn't really make sense.
- If you mix that with a smaller timeframe, the