Resampling from daily to monthly - lagging issue
-
I am facing a lagging issue when resampling my daily data feed to a monthly time frame.
I added a regular daily feed to my cerebro instance. I then used
cerebro.resampledata
to add a monthly datafeed to it. I would like to use the monthly data feed to compute a SMA indicator. Trades will be executed on the daily data feed.I realized that the resampled monthly data feed is only available at the first trading day of a new month. That looks a bit weird to me. I would expect it to be available at the close of any given month.
Take a look at the screenshot. I created a writer to analyze the pricing. My daily data starts at 2019-12-20. The first monthly closing data point can be seen in line 5. However, this is already 2020-01-02 for the daily data feed.
Am I missing something?
-
@andi The monthly datafeed will be calculated after the close of the month. And then displayed each day of the following month for use in daily calculations. Your chart looks correct. Each day in January is displaying Decembers monthly data.
-
@run-out I am totally with you. What bothers me is the fact, that the very first value is not available at line 4. At the close of 2019-12-30, I am able to resample the daily data into the monthly time frame.
Let me elaborate a little bit on this issue. I need to compute trading signals as per month end (based on the monthly data feed), execution will take place on the opening of the first trading day of the next month (based on the daily data feed). Now, when computing an SMA on monthly basis, I do know the new value as soon as I do have a closing price for the month (in contrast to get that value one day later),
Let's say, I want to generate a long signal, if my monthly closing price is above the monthly SMA. If I were to use monthly data exclusively, backtrader would generate this long signal as per close of the last trading day in the month (i.e. closing price of the monthly candle). And, equally important, this closing price will be part of the latest SMA value.
However, this is not the case when resampling the data, as can be seen in the screenshot. As a result, on the last trading day, I am comparing the correct closing value with an outdated SMA value.As a general rule, the monthly OHLC values on a daily basis should be the same for the whole month with the exception of the last trading day of the month. Here, we get the next/new OHLC value.
Here's backtrader's result with more lines:
All the columns that I marked yellow should be moved upward by one row. Here's what I expect:
Are you with me?
-
@andi I get what you are saying and this is not an uncommon misunderstanding when starting to use backtrader. If you come for a world of pandas or spreadsheets, what you are saying is right. All of the ohlcv data gets used to calculate anything in that day, and shows up in the same line as the close.
However, when using backtrader, once the close of the bar is past, then that period is essentially over, and any indicators will be available for the next day, otherwise the indicator would be available in the same bar of the close from which it is calculated, which of course is impossible.
Both ways are correct, the data is accurate, and there is no error. It's simply a difference of when you use the date. I like to say:
When using pandas or spreadsheets, you are in the morning, when using backtrader, you are at night.
Hope that makes some sense.
-
@run-out Let me try to get hold of this topic from a different perspective.
Let's assume I am operating on monthly data exclusively, i.e. only one data feed/no resampling. Let's say, I want to issue a buy order right after the monthly close if the closing value is greater than 100. It is my understanding that backtrader will issue this order right after the close and it will get executed with the opening price of the next bar, i.e. the very first price of the following trading day (which is then actually the opening price of the next monthly candle). Is this understanding correct? That would be a real-life scenario.
Now let's compare this to a situation where I use a daily feed as well as a resampled monthly feed. I again want to get my long order executed with the opening price of the first trading day of the new month. However, if I base my trading decision on monthly data (closing > 100), I can't take the decision after the close on the last trading day, because the resampled monthly data is not updated yet. It simply doesn't reflect the actual monthly closing price.
So my question is, how can there be a difference between outright monthly data and resampled monthly data. Shouldn't it be equal??
-
@andi I see your point.
I tried getting this to work using boundoff and rightedge but to no avail. This should have solved the problem, but my data was unchanged. I'm sure I'm missing something simple?
cerebro = bt.Cerebro() start = time.time() data = bt.feeds.GenericCSVData( dataname="data/2006-day-001.txt", dtformat=("%Y-%m-%d"), timeframe=bt.TimeFrame.Days, compression=1, ) print(f"time {(time.time() - start)}") cerebro.adddata(data) cerebro.resampledata(data, name="monthly", timeframe=bt.TimeFrame.Months, rightedge=False)
You could also pursue cheat on open to access the next days data lines.
Let us know how you do.
-
@run-out I have got no access to my machine right now. As you know, I am just starting with backtrader, i. e. I am not familiar with rightegde and boundaries. However, if
rightedge=False
didn't do the trick, you could tryboundoff=1
(just guessing on my side, trial & error) .Anyway, I think it's a bit weird, that we have to scratch our heads about upsampling. In my view, all the standard settings should lead to the result that I laid out previously. I am wondering if it is possible that the resampling method has a bug? On the other hand, backtrader seems to be a very mature framework and I would be surprised that I should be the first one who stumbles upon this issue.
Using cheat-on-open doesn't sound right either. I don't know if this would solve the issue at hand. However, it may lead to other "issues" down the road. I don't want to cheat, but I would like to have a realistic setup.
What would you think would happen if I don't resample the daily feed but regularly add the monthly data as a second feed?
-
@andi This is interesting. I am testing boundoff with minute data and it's working fine. However, with daily data it is not doing what is expected. @dasch , I noticed you had some dealings with this, can you shed some light on why boundoff doesn't work with daily data?
import datetime import backtrader as bt class Strategy(bt.Strategy): def __init__(self): self.mn_ind = bt.If(self.datas[1] > self.datas[1](-1), 1, 0) self.traded = False def next(self): if self.mn_ind[0] == 1 and not self.traded: self.buy(self.datas[0]) self.traded = True print( f"{self.datas[0].datetime.datetime()}, " f"daily: {self.datas[0].close[0]}, " f"month: {self.datas[1].close[0]}, " f"ind: {self.mn_ind[0]}" ) if __name__ == "__main__": cerebro = bt.Cerebro() data = bt.feeds.GenericCSVData( dataname="data/dev.csv", dtformat=("%Y-%m-%d %H:%M:%S"), timeframe=bt.TimeFrame.Minutes, compression=1, date=0, high=1, low=2, open=3, close=4, volume=6, ) cerebro.adddata(data) cerebro.resampledata( data, name="minutely", timeframe=bt.TimeFrame.Minutes, compression=10, boundoff=3, rightedge=True, ) # data = bt.feeds.GenericCSVData( # dataname="data/2006-day-001.txt", # dtformat=("%Y-%m-%d"), # timeframe=bt.TimeFrame.Days, # compression=1, # ) # print(f"time {(time.time() - start)}") # cerebro.adddata(data) # cerebro.resampledata( # data, # name="monthly", # timeframe=bt.TimeFrame.Months, # compression=1, # boundoff=2, # rightedge=True, # ) cerebro.addstrategy(Strategy) # Execute cerebro.run()
-
@run-out so for the initial question, the feed will forward as soon as a date appears that is over the boundry of the current period. So only when 2020 starts, the data forwards. Possibilities to overcome this: add some kind of filter for datas or use replay for data.
for the boundoff and rightedge i will try to ellaborate in a further message.
-
@dasch but thread anything i write with caution since i do not fully understand the inner works of backtrader at some points.
-
@andi said in Resampling from daily to monthly - lagging issue:
What would you think would happen if I don't resample the daily feed but regularly add the monthly data as a second feed?
If I add another regular monthly data feed (instead of resampling my daily feed), all problems are gone. All prices behave as I previously laid out.
One is basically free to download the data with monthly periodicity or write your own resample function. I came up with something like this:
def datafeed_to_monthly( df: pd.DataFrame, ): """ Resamples a daily data feed to a monthly time frame. The monthly datetime index will excactly reflect the datetime index of the original datetime index. For example, if on the daily datetime index, the last trading day is 26th of May, this will be reflected in the resampled data. Parameters ---------- df Daily `pandas.DataFrame` comprising OHLC/OHLCV data. The dates must be the index of type `datetime`. Returns ------- pandas.DataFrame Resampled monthly OHLC/OHLCV data. """ df["date"] = df.index if len(df.columns) == 5: mapping = dict( date="last", open="first", high="max", low="min", close="last", ) else: mapping = dict( date="last", open="first", high="max", low="min", close="last", volume="sum", ) return df.resample("BM").agg(mapping).set_index("date")
As a result, I come to the conclusion that the implementation in the
resample
method is not what I would expect. I would probably consider it to be incorrect, at least with regards to resample daily to monthly. However, I am happy to discuss this interpretation. -
@andi said in Resampling from daily to monthly - lagging issue:
As a result, I come to the conclusion that the implementation in the resample method is not what I would expect. I would probably consider it to be incorrect, at least with regards to resample daily to monthly. However, I am happy to discuss this interpretation.
@andi said in Resampling from daily to monthly - lagging issue:
I do know the new value as soon as I do have a closing price for the month (in contrast to get that value one day later),
The datetime 2019-12-30 23:59:59 does not say there is no more data for this period, but the timestamp 2020-01-02 23:59:59 does. Between these two timestamps the period switches at 2020-01-01 00:00:00 and 2020-01-02 00:00:00.
So backtrader will know about the date being the last one as soon as a date from new period appears. in your image with expected values you have all values shifted by -1. so id 4 will actually be your id 5.
You may fill some data with a datafiller filter.
See docu: https://www.backtrader.com/docu/filters/
-
@dasch said in Resampling from daily to monthly - lagging issue:
The datetime 2019-12-30 23:59:59 does not say there is no more data for this period, but the timestamp 2020-01-02 23:59:59 does. Between these two timestamps the period switches at 2020-01-01 00:00:00 and 2020-01-02 00:00:00.
I am probably not fully aware of how the resampling works in backtrader. Your description implies (to me), that the resampling takes place at runtime, maybe in the
next
method? If this is the case, I can follow your description.
However, I thought the resampling takes place before I added this feed tocerebro
. If backtrader resamples before running the strategy and adds the resampled feed, the program knows in advance, when it is the last day of the month, because it knows the complete daily feed. Am I correct with that assumption?Anyway, in real life we all know if the 30th of a month will be the last trading day or not. And if this is the case, we know the monthly closing price as soon as the closing bell rings. At this stage I can use that price for any computation and issue an order, which will then be executed at the opening of the next bar.
So again, in my view, the implementation of the
resample
method is ecomically incorrect.