When I resample 5M data to 1H it seems to be wrong - offset incorrectly.
I download historical data from Dukascopy for backtesting. Here is a piece of the 5M data file:
20.10.2020 15:55:00.000 GMT-0400,0.70532,0.70535,0.70522,0.70525,0.000500640000000002
20.10.2020 16:00:00.000 GMT-0400,0.70524,0.70529,0.70514,0.70528,0.0001319599999999999
20.10.2020 16:05:00.000 GMT-0400,0.70528,0.70534,0.70526,0.70530,0.00006616999999999997
20.10.2020 16:10:00.000 GMT-0400,0.70530,0.70535,0.70529,0.70530,0.00006858
20.10.2020 16:15:00.000 GMT-0400,0.70531,0.70535,0.70529,0.70530,0.00004368999999999999
20.10.2020 16:20:00.000 GMT-0400,0.70529,0.70530,0.70528,0.70530,0.000018000000000000004
20.10.2020 16:25:00.000 GMT-0400,0.70529,0.70531,0.70523,0.70524,0.00005646999999999998
20.10.2020 16:30:00.000 GMT-0400,0.70524,0.70530,0.70496,0.70498,0.00014451999999999995
20.10.2020 16:35:00.000 GMT-0400,0.70497,0.70497,0.70445,0.70449,0.0002016899999999998
20.10.2020 16:40:00.000 GMT-0400,0.70452,0.70457,0.70440,0.70451,0.0001535199999999999
20.10.2020 16:45:00.000 GMT-0400,0.70452,0.70478,0.70452,0.70465,0.00011174999999999995
20.10.2020 16:50:00.000 GMT-0400,0.70463,0.70477,0.70459,0.70477,0.00010162999999999984
20.10.2020 16:55:00.000 GMT-0400,0.70475,0.70480,0.70466,0.70468,0.00017255000000000005
20.10.2020 17:00:00.000 GMT-0400,0.70469,0.70510,0.70462,0.70491,0.000017390000000000008
20.10.2020 17:05:00.000 GMT-0400,0.70493,0.70510,0.70488,0.70488,0.000017259999999999997
That's date-time, open, high, low, close, volume. That 16:05 row represents the 5M period from 16:05 to 16:10. I verified that by examining the raw 1M and 5M data.
When I resample to 1H I I expect the 16:00 bar to represent actual times of 16:00-17:00, or the 5M bars 16:00-16:55. Instead I get this:
2020-10-20T17:00:00-04:00, O:0.70528 H:0.70535 L:0.7044 C:0.70491 ADX:23.74952207714273
The 17:00 bar represents the 5M bars 16:05-17:00 which represents actual times 16:05-17:05. So it's wrong in two ways:
- it's off by 5 minutes, it should start at the :00 minute instead of :05
- it's also off by an hour: this bar should be labelled 16:00
Do you agree? Can this be fixed, or is there some way I can work around it?
Here is the relevant code:
# Use my 'data feed', the pre-loaded bars[]
data = MyData(timeframe=bt.TimeFrame.Minutes, compression=5)
cerebro.adddata(data)
# and resample to get 1H data
cerebro.resampledata(data, timeframe=bt.TimeFrame.Minutes, compression=60)
I am optimizing, so I do a one-time load of a CSV file into a list bars[], and then MyData reads bars[] into the dataframe:
class MyData(bt.DataBase):
params = (
('timeframe', bt.TimeFrame.Minutes),
('compression', 5))
def __init__(self):
global bars
self.idx = 0
self.size = len(bars)
def start(self):
global bars
self.idx = 0
def stop(self):
pass
def _load(self):
global bars
if self.idx >= self.size:
# at end of data
return False
dt, open, high, low, close = bars[ self.idx ]
self.lines.datetime[0] = dt
self.lines.open[0] = open
self.lines.high[0] = high
self.lines.low[0] = low
self.lines.close[0] = close
self.idx += 1
return True
I know the above code is good because my 'next' function prints each 5M bar and they match up with the CSV file.