When I resample 5M data to 1H it seems to be wrong - offset incorrectly.
I download historical data from Dukascopy for backtesting. Here is a piece of the 5M data file:
20.10.2020 15:55:00.000 GMT-0400,0.70532,0.70535,0.70522,0.70525,0.000500640000000002 20.10.2020 16:00:00.000 GMT-0400,0.70524,0.70529,0.70514,0.70528,0.0001319599999999999 20.10.2020 16:05:00.000 GMT-0400,0.70528,0.70534,0.70526,0.70530,0.00006616999999999997 20.10.2020 16:10:00.000 GMT-0400,0.70530,0.70535,0.70529,0.70530,0.00006858 20.10.2020 16:15:00.000 GMT-0400,0.70531,0.70535,0.70529,0.70530,0.00004368999999999999 20.10.2020 16:20:00.000 GMT-0400,0.70529,0.70530,0.70528,0.70530,0.000018000000000000004 20.10.2020 16:25:00.000 GMT-0400,0.70529,0.70531,0.70523,0.70524,0.00005646999999999998 20.10.2020 16:30:00.000 GMT-0400,0.70524,0.70530,0.70496,0.70498,0.00014451999999999995 20.10.2020 16:35:00.000 GMT-0400,0.70497,0.70497,0.70445,0.70449,0.0002016899999999998 20.10.2020 16:40:00.000 GMT-0400,0.70452,0.70457,0.70440,0.70451,0.0001535199999999999 20.10.2020 16:45:00.000 GMT-0400,0.70452,0.70478,0.70452,0.70465,0.00011174999999999995 20.10.2020 16:50:00.000 GMT-0400,0.70463,0.70477,0.70459,0.70477,0.00010162999999999984 20.10.2020 16:55:00.000 GMT-0400,0.70475,0.70480,0.70466,0.70468,0.00017255000000000005 20.10.2020 17:00:00.000 GMT-0400,0.70469,0.70510,0.70462,0.70491,0.000017390000000000008 20.10.2020 17:05:00.000 GMT-0400,0.70493,0.70510,0.70488,0.70488,0.000017259999999999997
That's date-time, open, high, low, close, volume. That 16:05 row represents the 5M period from 16:05 to 16:10. I verified that by examining the raw 1M and 5M data.
When I resample to 1H I I expect the 16:00 bar to represent actual times of 16:00-17:00, or the 5M bars 16:00-16:55. Instead I get this:
2020-10-20T17:00:00-04:00, O:0.70528 H:0.70535 L:0.7044 C:0.70491 ADX:23.74952207714273
The 17:00 bar represents the 5M bars 16:05-17:00 which represents actual times 16:05-17:05. So it's wrong in two ways:
- it's off by 5 minutes, it should start at the :00 minute instead of :05
- it's also off by an hour: this bar should be labelled 16:00
Do you agree? Can this be fixed, or is there some way I can work around it?
Here is the relevant code:
# Use my 'data feed', the pre-loaded bars data = MyData(timeframe=bt.TimeFrame.Minutes, compression=5) cerebro.adddata(data) # and resample to get 1H data cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=60)
I am optimizing, so I do a one-time load of a CSV file into a list bars, and then MyData reads bars into the dataframe:
class MyData(bt.DataBase): params = ( ('timeframe', bt.TimeFrame.Minutes), ('compression', 5)) def __init__(self): global bars self.idx = 0 self.size = len(bars) def start(self): global bars self.idx = 0 def stop(self): pass def _load(self): global bars if self.idx >= self.size: # at end of data return False dt, open, high, low, close = bars[ self.idx ] self.lines.datetime = dt self.lines.open = open self.lines.high = high self.lines.low = low self.lines.close = close self.idx += 1 return True
I know the above code is good because my 'next' function prints each 5M bar and they match up with the CSV file.
btuses time of the end of the bar to designate the bar. So the resampling is correct in terms of
bt. You can try to use parameters as described here to match your expectations. Also search the forum, it was discussed several times already.
That's exactly what I needed, thank you.
In case anyone from The Future reads this, I specified rightedge=False in the resample code, that took care of the hour issue. But it did not address the 5M problem - the hourly bar still represented eg 14:05-15:05, instead of 14:00-15:00. So I made an ugly hack: I add 5M to the datetime of each 5M bar.