Help Understanding Replay on Intraday Data/Compression
-
I see others are having some issues with getting
replaydata
to work correctly, and I haven't seen a reproducible code example of what I'm trying to accomplish, so figured I'd ask the gods here.Question 1 - How exactly does
replaydata
work? I.e., how does it simulate the bars from the lower timeframe? As an example using the 1 minute timeframe, TradingView's replay feature will start with the open, then go to the next nearest price (whether it be high or low) on the 2nd tick, then will go to the opposite H/L price on the 3rd tick, before finally going to the close price on the 4th. Does BackTrader do this, or does it just simulate the entire 1 minute bar in one shot?Question 2 - For arguments sake, let's say I have a 1 minute dataset in CSV format, and I'd like to "replay" at the 5 minute timeframe. Would I be correct in assuming that the proper syntax would be:
cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5)
... or would there need to be extra options added to ensure the timeframe's lined up properly (as I read someone else having issues here). When I run the example strategy:
def next(self): # Simply log the closing price of the series from the reference # self.log('Close, %.2f' % self.dataclose[0]) # Check if an order is pending ... if yes, we cannot send a 2nd one if self.order: return # Check if we are in the market if not self.position: # Not yet ... we MIGHT BUY if ... if self.dataclose[0] > self.sma[0]: # BUY, BUY, BUY!!! (with all possible default parameters) self.log('BUY CREATE, %.2f' % self.dataclose[0]) # Keep track of the created order to avoid a 2nd order self.order = self.buy() else: if self.dataclose[0] < self.sma[0]: # SELL, SELL, SELL!!! (with all possible default parameters) self.log('SELL CREATE, %.2f' % self.dataclose[0]) # Keep track of the created order to avoid a 2nd order self.order = self.sell()
...my output looks like this, which seems really off:
Starting Portfolio Value: 10000.00 2019-12-10 00:00:01.209599, BUY CREATE, 304.18 2019-12-10 00:00:01.295999, BUY EXECUTED, Price: 304.12, Cost: 3041.23, Comm 3.04 2019-12-10 00:00:02.851211, SELL CREATE, 304.22 2019-12-10 00:00:02.937612, SELL EXECUTED, Price: 304.26, Cost: 3041.23, Comm 3.04 2019-12-10 00:00:02.937612, OPERATION PROFIT, GROSS 1.35, NET -4.73 2019-12-10 00:00:04.060820, BUY CREATE, 304.17 2019-12-10 00:00:04.147221, BUY EXECUTED, Price: 304.19, Cost: 3041.90, Comm 3.04 2019-12-10 00:00:05.356830, SELL CREATE, 304.25 2019-12-10 00:00:05.443231, SELL EXECUTED, Price: 304.30, Cost: 3041.90, Comm 3.04 ... 2019-12-10 00:00:59.789245, BUY CREATE, 303.31 2019-12-10 00:00:59.875645, BUY EXECUTED, Price: 303.32, Cost: 3033.20, Comm 3.03 2019-12-10 00:01:00.221248, SELL CREATE, 303.30 2019-12-10 23:59:59.999989, SELL EXECUTED, Price: 302.45, Cost: 3033.20, Comm 3.02 2019-12-10 23:59:59.999989, OPERATION PROFIT, GROSS -8.70, NET -14.76 2019-12-11 00:00:01.814403, BUY CREATE, 302.26 2019-12-11 00:00:01.900804, BUY EXECUTED, Price: 302.25, Cost: 3022.47, Comm 3.02 2019-12-11 00:00:03.024012, SELL CREATE, 302.50 2019-12-11 00:00:03.110413, SELL EXECUTED, Price: 302.47, Cost: 3022.47, Comm 3.02 2019-12-11 00:00:03.110413, OPERATION PROFIT, GROSS 2.22, NET -3.82
...and it goes on and on like this, closes trades at the end of the day, and wayyy too many trades being placed in the mean time for a simple price crossing sma strategy? (I can post full code if desired, but without my dataset, might seem like pointless clutter, as I'm just following the documentation strategy).
Question 3 - With the above two things in mind, let's say I've created my own "tick" dataset as described in Q1 above that looks like this (notice the time stamps):
...how could I "replay" this data on the 5 minute timeframe, ensuring the times line up as described in some other posts re:
replaydata
? Before sending me to the documentation page, I've read it a few times, but there only seems to be direction on the daily/weekly timeframes, but not tick/intraday, so would love some clarification on it.Thanks!!! I hope this post can serve as a guide to anyone else looking to work with tick data (unless you can steer me to similar posts that I haven't seen already). Would like to see all 3 questions get answered here.
-
Hi @Matt-Wilson
Can you please share the code of how you import your data to backtrader. For example, how does your data feed look?
Similar to this?
datapath = os.path.join('../../filename.csv') data = bt.feeds.GenericCSVData(dataname=datapath, separator=";", fromdate=fromdate, todate=todate, dtformat=('%Y%m%d'), tmformat=('%H:%M:%S:%f'), timeframe=bt.TimeFrame.MicroSeconds, compression=1, date=0, time=1, open=2, high=3, low=2, close=3, volume=4, openinterest=-1 )
-
Certainly, this is what that looks like:
datapath = './data/SPY1min.csv' data = bt.feeds.GenericCSVData( dataname=datapath, # fromdate=datetime.datetime(2000, 1, 1), # todate=datetime.datetime(2000, 12, 31), # nullvalue=0.0, reverse=False, dtformat=('%Y-%m-%d %H:%M:%S'), tmformat=('%H:%M:%S'), datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5)
-
Cool thanks.
So as I have it (looking at your
datetime
format, it seems like you have data that contains seconds (assuming it is in 1-second intervals). Note that you need to TELL backtrader what frequency your data is in thebt.feed
. Therefore you should always (for safety) add the linestimeframe
andcompression
in yourbt.feed
.As I have used
replay
in the past I would suggest you try the following (if your data is in 1-second intervals). ***SideNote --- if it is in 15-second intervals (as your previous dataset), then you would still use the sametimeframe
parameter but changecompression = 15
:datapath = './data/SPY1min.csv' data = bt.feeds.GenericCSVData( dataname=datapath, reverse=False, dtformat=('%Y-%m-%d %H:%M:%S'), # tmformat=('%H:%M:%S'), # this line might not be neccessary? timeframe=bt.TimeFrame.Seconds, compression=1, datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5) # this will then be stored in variable self.data/self.data0 in 5 minute intervals
A way to test this is to keep two data sources to compare when running your strategies. For instance, keep both 1-second interval data and then also your 5-minute resampled data. If you would want to test this, keep the
bt.feed
the same (like above) but replace your replaydata line with:cerebro.replaydata(data, timeframe=bt.TimeFrame.Seconds) # stored in variable self.data OR self.data0 which is every 1-second interval data.plotinfo.plotmaster = data # ignore: this just ensures that it plots both intervals on the same plot cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5) # stored in variable self.data1 which is every 5-minute interval
Therefore, in your
def next()
you canprint("1-second interval:", self.data0.close[0], "5-minute interval:", self.data1.close[0])
which will print out the close price of your 1-second interval and your 5 minute interval. The result you will be looking for is that theself.data0.close[0]
will change each second but theself.data1.close[0]
will stay the same for 5 minutes before changing. -
Thank you for the suggestions, however let me back up a second as there seems to be something wrong with way my datetime's are showing up in the log of BT.
I'm using a simple 1 minute dataset, taken from AlphaVantage, which looks like this:
Simple enough, the times shown are in NY time, and show extended trading hours (i.e. 4 AM).
NOT using
replaydata
, just simply usingadddata
, my output for the log looks like this:Notice the times are all at midnight, even on the same day, which isn't accurate, this should be showing the current datetime down to the minute of when each trade was placed?
I'll show my full code below for reproducibility, but I'm using a 500 period moving average and the simple "price crossing MA" strategy, so there aren't a lot of trades happening. I just need to get this datetime issue sorted out, and then I can accurately implement your suggestions and go from there.
The dataset that I'm using for these tests can be accessed from my Google Drive link here, and then run the code below to see the same output. Could this be because of how I'm defining the datetime format? Even though it looks right, or are orders really only fulfilled at the end of each day by default?
Thanks! Once I get this fixed, I will start on your suggestions.
Code here:
from __future__ import (absolute_import, division, print_function, unicode_literals) import datetime # For datetime objects import os.path # To manage paths import sys # To find out the script name (in argv[0]) # Import the backtrader platform import backtrader as bt # Create a Stratey class TestStrategy(bt.Strategy): params = (('maperiod', 500),) def log(self, txt, dt=None): ''' Logging function fot this strategy''' dt = dt or self.datas[0].datetime.datetime(0) # Attempting two different print methods here for the current # datetime. print('%s, %s' % (dt.isoformat(), txt)) # print('%s, %s' % (dt, txt)) def __init__(self): # Keep a reference to the "close" line in the data[0] dataseries self.dataclose = self.datas[0].close # To keep track of pending orders and buy price/commission self.order = None self.buyprice = None self.buycomm = None # Add a MovingAverageSimple indicator self.sma = bt.indicators.SimpleMovingAverage( self.datas[0], period=self.params.maperiod) def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: # Buy/Sell order submitted/accepted to/by broker - Nothing to do return # Check if an order has been completed # Attention: broker could reject order if not enough cash if order.status in [order.Completed]: if order.isbuy(): self.log( 'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.buyprice = order.executed.price self.buycomm = order.executed.comm else: # Sell self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.bar_executed = len(self) elif order.status in [order.Canceled, order.Margin, order.Rejected]: self.log('Order Canceled/Margin/Rejected') # Write down: no pending order self.order = None def notify_trade(self, trade): if not trade.isclosed: return self.log('OPERATION PROFIT, GROSS %.2f, NET %.2f' % (trade.pnl, trade.pnlcomm)) def next(self): # Simply log the closing price of the series from the reference # self.log('Close, %.2f' % self.dataclose[0]) # Check if an order is pending ... if yes, we cannot send a 2nd one if self.order: return # Check if we are in the market if not self.position: # Not yet ... we MIGHT BUY if ... if self.dataclose[0] > self.sma[0]: # BUY, BUY, BUY!!! (with all possible default parameters) self.log('BUY CREATE, %.2f' % self.dataclose[0]) # Keep track of the created order to avoid a 2nd order self.order = self.buy() else: if self.dataclose[0] < self.sma[0]: # SELL, SELL, SELL!!! (with all possible default parameters) self.log('SELL CREATE, %.2f' % self.dataclose[0]) # Keep track of the created order to avoid a 2nd order self.order = self.sell() if __name__ == '__main__': # Create a cerebro entity cerebro = bt.Cerebro() # Add a strategy cerebro.addstrategy(TestStrategy) datapath = './data/SPY1min.csv' data = bt.feeds.GenericCSVData( dataname=datapath, reverse=False, dtformat=('%Y-%m-%d %H:%M'), datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro cerebro.adddata(data) # cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=15) # Set our desired cash start cerebro.broker.setcash(10000.0) # Add a FixedSize sizer according to the stake cerebro.addsizer(bt.sizers.FixedSize, stake=1) # Set the commission cerebro.broker.setcommission(commission=0.0001) # Print out the starting conditions print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue()) # Run over everything cerebro.run() # Print out the final result print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue()) # Plot the result cerebro.plot()
-
I should also point out the plot looks off, the green and red arrows are not showing on the price the trade was placed at, but way below/above them, and when I try using my 15 second dataset, it's even worse. Not sure if it's relevant or not, but just thought I'd mention it:
-
Quick update,
With the help of this post re: the 1 minute issue, and this post re: the plotting issue, I've been able to fix these minor problems. I will be fairly busy until Thursday this week, but I will post an update with @Pierre-Cilliers-0 's suggestions after that.
-
Ok last update until Thursday haha
So using @Pierre-Cilliers-0 's suggestions, and using my 15 second dataset again, I've added this print line to my
next()
function:print(self.datas[0].datetime.datetime(0).isoformat(),"15-second close:", self.data0.close[0], "5-minute close:", self.data1.close[0])
as well as these lines when defining my
datareplay
's/cerebro re: the plotting issue (which still isn't perfect, but it's close enough):# Create a cerebro entity cerebro = bt.Cerebro(stdstats=False) # Add a strategy cerebro.addstrategy(TestStrategy) datapath = 'SPY_5_1_5_days_backtest_dataset.csv' data = bt.feeds.GenericCSVData( dataname=datapath, reverse=False, dtformat=('%Y-%m-%d %H:%M:%S'), timeframe=bt.TimeFrame.Seconds, compression=15, datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro # cerebro.adddata(data) cerebro.replaydata(data, timeframe=bt.TimeFrame.Seconds) data.plotinfo.plotmaster = data # ignore cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5) cerebro.addobserver( bt.observers.BuySell, barplot=True, bardist=0) # buy / sell arrows
Running this code, my 15 second close and my 5 minute close are identical, the 5 minute close == the 15 close on every bar. I would imagine the 5 minute close should stay as the last fully formed 5 minute bar's close price, until the 15 second dataset reaches the next % 5 minute spot, yes? Or is this a normal output? Output is below:
So will need to figure this out, however
ANSWER TO QUESTION 1 in OP
The lower timeframe used inreplaydata
uses the close column's "ticks" to simulate the replays.ANSWER TO QUESTION 2 in OP
The correct syntax when replaying 1 minute data up to a 5 minute interval would look like this:data = bt.feeds.GenericCSVData( dataname=datapath, reverse=False, dtformat=('%Y-%m-%d %H:%M:%S'), timeframe=bt.TimeFrame.Minutes, compression=1, datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5)
Once we get this resampled 5 minute close price thing lining up properly, I can answer question 3. Talk Thursday! Thanks again.
-
Hi @Matt-Wilson
Just a few points I want to bring up. Sorry if I dont respond to everything. Will try my best.
First thing first, I assume you are using the 1-minute interval data for your first questions.
@matt-wilson said in Help Understanding Replay on Intraday Data/Compression:
I'm using a simple 1 minute dataset, taken from AlphaVantage, which looks like this:
Note that if your trades are being made on 23:59:59.99999, it means that backtrader is executing everything on a daily-interval. This is confirmed if you look at your plot. See next to the SPY1min word, it says (1 Dat), which means that backtrader interperates the data on a daily-interval.
I have two suggestions for this.
First, add the following lines to your bt.feeds:... , dtformat=('%Y-%m-%d %H:%M'), timeframe=bt.TimeFrame.Minutes, compression=1, ... ,
This will ensure that Backtrader knows that the input data is 1(compression=1)-minute(timeframe=bt.TimeFrame.Minutes)) intervals.
Secondly, I see you are calling
self.datas
when assigning your moving average and closing price. Try replace this withself.data/self.data0
instead. The variableself.datas
is a collective object of all data sources in the backtrader backend whereself.data/self.data0
points to the first input data source which is your 1-minute interval data. Therefore try replaceself.dataclose = self.data[0].close
andself.sma = bt.indicators.SimpleMovingAverage(self.data[0], period=self.params.maperiod)
. -
@matt-wilson Now touching on your final post.
@matt-wilson said in Help Understanding Replay on Intraday Data/Compression:
Running this code, my 15 second close and my 5 minute close are identical, the 5 minute close == the 15 close on every bar. I would imagine the 5 minute close should stay as the last fully formed 5 minute bar's close price, until the 15 second dataset reaches the next % 5 minute spot, yes? Or is this a normal output? Output is below:
I will take the hit for this, sorry.
Due to a time interval's close price being dynamic (meaning that it updates continuously until the timeframe is closed) the 15-second and 5-minute intervalsclose[0]
will always be the same.
To explain this better... at any given point in time (lets say at 10 seconds into your execution, the current rulling price will be assigned to the close price of both your 15-second and 5-minute intervals)./
But note that the open price will indicate what we are looking for as these values are static, not dynamic.So please replace your line with the following top check whether the replaydatya works properly.
print(self.datas[0].datetime.datetime(0).isoformat(),"15-second open:", self.data0.open[0], "5-minute open:", self.data1.open[0])
Chat Thursday mate.
-
Not a problem! Most of that was me working through it aloud :P
Ahhhh yes that makes sense, duh, the close is always moving in real time, of course. Which is good! I like knowing that's what's going on during the backtest anyway. So I think I'm all good now!
Your suggestions really helped me understand what's going on with
replaydata
, and I so appreciate it! So to answer my final question...ANSWER TO QESTION 3 in OP
data = bt.feeds.GenericCSVData( dataname=datapath, reverse=False, dtformat=('%Y-%m-%d %H:%M:%S'), timeframe=bt.TimeFrame.Seconds, compression=15, datetime=0, open=1, high=2, low=3, close=4, volume=5, openinterest=-1 ) # Add the Data Feed to Cerebro # cerebro.adddata(data) cerebro.replaydata(data, timeframe=bt.TimeFrame.Seconds) # stored in variable self.data OR self.data0 which is every 1-second interval data.plotinfo.plotmaster = data # ignore: this just ensures that it plots both intervals on the same plot cerebro.replaydata(data, timeframe=bt.TimeFrame.Minutes, compression=5) # stored in variable self.data1 which is every 5-minute interval
Be sure to add the
compression
andtimeframe
arguments in thebt.feeds.WhateverCSVData()
call to tell BT what kind of data it's originally looking at, then you can add your higher timeframe(s) when invoking thereplaydata
. Then you can access each dataset individually withself.data0
orself.data1
in the strategy class.Thanks again!! Have a great end to the week!
-
@matt-wilson Cool man.
Yeah it took me a while to resample/replay my data correctly, but now that it is clear, you can cruise through your backtesting