Escape from OHLC Land
-
Use the link below to go the original post
Click here to see the full blog post
-
Thanks for this example
I am trying to adapt it to a dataset with the same colum structure for simplicity, the only difference is that the datetime field also has millisconds (eg 3 digits after the dot (like so: 02/03/2010 16:53:50.158).
As such I have made a slight modification to the code as per below, the problem I have is that I then get 3 extra random digits at the back of those millisecond (e.g. instead of 02/03/2010 16:53:50.158 I get 02/03/2010 16:53:50.158003) :
parser.add_argument('--dtformat', '-dt',
required=False, default='%Y%m%d %H:%M:%S.%f',
help='Format of datetime in input')
I have tried to replace the above code with "default='%Y%m%d %H:%M:%S.%fff'" but that doesn't work .Any help would be greatly appreciated.
-
The code responsible for the parsing is the standard Python
datetime
module. The%
-based format definition string can be made following this: Python Docs - datetimeThe direct link for the
strptime
: Python Docs - datetime - 8.1.8. strftime() and strptime() BehaviorIn any case:
%fff
doesn't seem right. That would expect something like this:0123456ff
-
Thanks so much! That was as simple as modifying this line:
dtstr = self.data.datetime.datetime().isoformat(timespec='milliseconds')
WOuld it be fair to say that this fix is only cosmetic and the the 3 extra random digits that were added at the back of the micorseconds are still occuring in the background?
Just wondering if I would get consistent results should we process those micorsecond ticks (for instance in terms of count within 5ms OHLC bars) -
I did shout "Victory" too early indeed. Below is the tail end of my dataset
Original csv input (last 5 lines): 1.2192,1.21923,20180228 23:59:47.750 1.21918,1.21922,20180228 23:59:51.024 1.21917,1.2192,20180228 23:59:51.029 1.21916,1.2192,20180228 23:59:51.087 1.21915,1.21919,20180228 23:59:58.725
Indeed we can spot a few discrepencies when I look at the output of the backtrader script:
3916973: 2018-02-28T23:59:47.749 - Bid 1.21920 - 1.21924 Ask 3916974: 2018-02-28T23:59:51.023 - Bid 1.21918 - 1.21922 Ask 3916975: 2018-02-28T23:59:51.028 - Bid 1.21917 - 1.21920 Ask 3916976: 2018-02-28T23:59:51.087 - Bid 1.21916 - 1.21920 Ask 3916977: 2018-02-28T23:59:58.725 - Bid 1.21915 - 1.21919 Ask
I have tried date2num function which I noticed you use a few times across background files, but it isn't working either:
At this stage, I am at a loss as to where it would make sense to apply any modification so as not to corrupt the input (Tick data from TrueFX)dtstr = self.data.datetime.datetime().isoformat(timespec='milliseconds') txt = '%4d: %s - Bid %.5f - %.5f Ask' % ((len(self), date2num(dtstr)/1000, self.data.bid[0], self.data.ask[0])) ````
-
The datetimes are coded to a float (using
matplotlib
definition). The coding (as any coding trying to fit microseconds into 8 bytes) loses precision.In any case I can't really understand what the real problem is here. It seemed above you couldn't parse the timestamps, but it seems you are concerned about display issues and that the timestamps have been parsed right all along.
-
@backtrader - This is really helpful. I looked into dateintern.py to better grasb where rounding is occuring, I now understand why adding (timespec='milliseconds') brought more confusion, I have therefore removed it as it was truncating as opposed to rounding to nearest .001s. However to your point this was only a display issue - in the background the actual rounding is much less insignificant and clearly not a source of concern. See relevant code here for those with an interest:
def num2date(x, tz=None, naive=True): # Same as matplotlib except if tz is None a naive datetime object # will be returned. """ *x* is a float value which gives the number of days (fraction part represents hours, minutes, seconds) since 0001-01-01 00:00:00 UTC *plus* *one*. The addition of one here is a historical artifact. Also, note that the Gregorian calendar is assumed; this is not universal practice. For details, see the module docstring. Return value is a :class:`datetime` instance in timezone *tz* (default to rcparams TZ value). If *x* is a sequence, a sequence of :class:`datetime` objects will be returned. """ ix = int(x) dt = datetime.datetime.fromordinal(ix) remainder = float(x) - ix hour, remainder = divmod(HOURS_PER_DAY * remainder, 1) minute, remainder = divmod(MINUTES_PER_HOUR * remainder, 1) second, remainder = divmod(SECONDS_PER_MINUTE * remainder, 1) microsecond = int(MUSECONDS_PER_SECOND * remainder) if microsecond < 10: microsecond = 0 # compensate for rounding errors if True and tz is not None: dt = datetime.datetime( dt.year, dt.month, dt.day, int(hour), int(minute), int(second), microsecond, tzinfo=UTC) dt = dt.astimezone(tz) if naive: dt = dt.replace(tzinfo=None) else: # If not tz has been passed return a non-timezoned dt dt = datetime.datetime( dt.year, dt.month, dt.day, int(hour), int(minute), int(second), microsecond) if microsecond > 999990: # compensate for rounding errors dt += datetime.timedelta(microseconds=1e6 - microsecond) return dt
Going deep into your code really helped me appreciate how much work was put into this, (and I know I am only scratching the surface , this is humbling - I wish I could code like you do :-).
For newbies like myself - it is probably worth mentioning the F7 button in Pycharm Debuging facility to execute line by line. This is a real life saver. Hope that helps some people. -
One last question on this, as mentionned, I did use a csv file with the exact same column structure as in your example (columns=[bid, ask, datetime in last position]). The original file however come with the following structure [symbol, datetime, bid, ask] as show here:
EUR/USD,20180201 00:00:00.125,1.24171,1.24173 EUR/USD,20180201 00:00:00.262,1.24172,1.24173 EUR/USD,20180201 00:00:00.695,1.24172,1.24175 EUR/USD,20180201 00:00:00.838,1.24173,1.24178 EUR/USD,20180201 00:00:00.848,1.24174,1.24177
Thus modifing the BidAskCSV class:
class BidAskCSV(btfeeds.GenericCSVData): linesoverride = True # discard usual OHLC structure # datetime must be present and last lines = ('PAIR','datetime','bid', 'ask') # datetime (always 1st) and then the desired order for params = ( ('PAIR', 0), # inherited from parent class ('datetime', 1), # inherited from parent class ('bid', 2), # default field pos 1 ('ask', 3) # default field pos 2 )
The above won't work due to lines = ('PAIR','datetime','bid', 'ask') however replacing the section below instead is not returning an error message and so seems to be working fine:
lines = ('datetime','bid', 'ask')
I understand that the field 'PAIR' being a string is what is causing the error (code section below from -loadline), thus my questions:
- What is the underlying rationale ? Is it because "lines" should only refers to preset categories defined in BT (datetime, OHLV, volume, etc...)?
- What is the impact of simply not mentioning "PAIR" in the "lines" list (if any)?
for linefield in (x for x in self.getlinealiases() if x != 'datetime'): # Get the index created from the passed params csvidx = getattr(self.params, linefield) if csvidx is None or csvidx < 0: # the field will not be present, assignt the "nullvalue" csvfield = self.p.nullvalue else: # get it from the token csvfield = linetokens[csvidx] if csvfield == '': # if empty ... assign the "nullvalue" csvfield = self.p.nullvalue # get the corresponding line reference and set the value line = getattr(self.lines, linefield) line[0] = float(float(csvfield))# Why is the expectation to ALWAYS have a float - can it be changed for more flexibility? (cg 'PAIR' filed in TruFX import - Error message return True