For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Escape from OHLC Land


  • administrators

    Use the link below to go the original post

    Click here to see the full blog post



  • Thanks for this example
    I am trying to adapt it to a dataset with the same colum structure for simplicity, the only difference is that the datetime field also has millisconds (eg 3 digits after the dot (like so: 02/03/2010 16:53:50.158).
    As such I have made a slight modification to the code as per below, the problem I have is that I then get 3 extra random digits at the back of those millisecond (e.g. instead of 02/03/2010 16:53:50.158 I get 02/03/2010 16:53:50.158003) :
    parser.add_argument('--dtformat', '-dt',
    required=False, default='%Y%m%d %H:%M:%S.%f',
    help='Format of datetime in input')
    I have tried to replace the above code with "default='%Y%m%d %H:%M:%S.%fff'" but that doesn't work .

    Any help would be greatly appreciated.


  • administrators

    The code responsible for the parsing is the standard Python datetime module. The %-based format definition string can be made following this: Python Docs - datetime

    The direct link for the strptime: Python Docs - datetime - 8.1.8. strftime() and strptime() Behavior

    In any case: %fff doesn't seem right. That would expect something like this: 0123456ff



  • Thanks so much! That was as simple as modifying this line:

    dtstr = self.data.datetime.datetime().isoformat(timespec='milliseconds')
    

    WOuld it be fair to say that this fix is only cosmetic and the the 3 extra random digits that were added at the back of the micorseconds are still occuring in the background?
    Just wondering if I would get consistent results should we process those micorsecond ticks (for instance in terms of count within 5ms OHLC bars)



  • I did shout "Victory" too early indeed. Below is the tail end of my dataset

    Original csv input (last 5 lines):
    1.2192,1.21923,20180228 23:59:47.750
    1.21918,1.21922,20180228 23:59:51.024
    1.21917,1.2192,20180228 23:59:51.029
    1.21916,1.2192,20180228 23:59:51.087
    1.21915,1.21919,20180228 23:59:58.725
    

    Indeed we can spot a few discrepencies when I look at the output of the backtrader script:

    3916973: 2018-02-28T23:59:47.749 - Bid 1.21920 - 1.21924 Ask
    3916974: 2018-02-28T23:59:51.023 - Bid 1.21918 - 1.21922 Ask
    3916975: 2018-02-28T23:59:51.028 - Bid 1.21917 - 1.21920 Ask
    3916976: 2018-02-28T23:59:51.087 - Bid 1.21916 - 1.21920 Ask
    3916977: 2018-02-28T23:59:58.725 - Bid 1.21915 - 1.21919 Ask
    

    I have tried date2num function which I noticed you use a few times across background files, but it isn't working either:
    At this stage, I am at a loss as to where it would make sense to apply any modification so as not to corrupt the input (Tick data from TrueFX)

            dtstr = self.data.datetime.datetime().isoformat(timespec='milliseconds')
            txt = '%4d: %s - Bid %.5f - %.5f Ask' % ((len(self),
                                                      date2num(dtstr)/1000,
                                                      self.data.bid[0], self.data.ask[0]))
    ````

  • administrators

    The datetimes are coded to a float (using matplotlib definition). The coding (as any coding trying to fit microseconds into 8 bytes) loses precision.

    In any case I can't really understand what the real problem is here. It seemed above you couldn't parse the timestamps, but it seems you are concerned about display issues and that the timestamps have been parsed right all along.



  • @backtrader - This is really helpful. I looked into dateintern.py to better grasb where rounding is occuring, I now understand why adding (timespec='milliseconds') brought more confusion, I have therefore removed it as it was truncating as opposed to rounding to nearest .001s. However to your point this was only a display issue - in the background the actual rounding is much less insignificant and clearly not a source of concern. See relevant code here for those with an interest:

    def num2date(x, tz=None, naive=True):
        # Same as matplotlib except if tz is None a naive datetime object
        # will be returned.
        """
        *x* is a float value which gives the number of days
        (fraction part represents hours, minutes, seconds) since
        0001-01-01 00:00:00 UTC *plus* *one*.
        The addition of one here is a historical artifact.  Also, note
        that the Gregorian calendar is assumed; this is not universal
        practice.  For details, see the module docstring.
        Return value is a :class:`datetime` instance in timezone *tz* (default to
        rcparams TZ value).
        If *x* is a sequence, a sequence of :class:`datetime` objects will
        be returned.
        """
        ix = int(x)
        dt = datetime.datetime.fromordinal(ix)
        remainder = float(x) - ix
        hour, remainder = divmod(HOURS_PER_DAY * remainder, 1)
        minute, remainder = divmod(MINUTES_PER_HOUR * remainder, 1)
        second, remainder = divmod(SECONDS_PER_MINUTE * remainder, 1)
        microsecond = int(MUSECONDS_PER_SECOND * remainder)
        if microsecond < 10:
            microsecond = 0  # compensate for rounding errors
    
        if True and tz is not None:
            dt = datetime.datetime(
                dt.year, dt.month, dt.day, int(hour), int(minute), int(second),
                microsecond, tzinfo=UTC)
            dt = dt.astimezone(tz)
            if naive:
                dt = dt.replace(tzinfo=None)
        else:
            # If not tz has been passed return a non-timezoned dt
            dt = datetime.datetime(
                dt.year, dt.month, dt.day, int(hour), int(minute), int(second),
                microsecond)
    
        if microsecond > 999990:  # compensate for rounding errors
            dt += datetime.timedelta(microseconds=1e6 - microsecond)
    
        return dt
    

    Going deep into your code really helped me appreciate how much work was put into this, (and I know I am only scratching the surface , this is humbling - I wish I could code like you do :-).
    For newbies like myself - it is probably worth mentioning the F7 button in Pycharm Debuging facility to execute line by line. This is a real life saver. Hope that helps some people.



  • One last question on this, as mentionned, I did use a csv file with the exact same column structure as in your example (columns=[bid, ask, datetime in last position]). The original file however come with the following structure [symbol, datetime, bid, ask] as show here:

    EUR/USD,20180201 00:00:00.125,1.24171,1.24173
    EUR/USD,20180201 00:00:00.262,1.24172,1.24173
    EUR/USD,20180201 00:00:00.695,1.24172,1.24175
    EUR/USD,20180201 00:00:00.838,1.24173,1.24178
    EUR/USD,20180201 00:00:00.848,1.24174,1.24177
    

    Thus modifing the BidAskCSV class:

    class BidAskCSV(btfeeds.GenericCSVData):
        linesoverride = True  # discard usual OHLC structure
        # datetime must be present and last
        lines = ('PAIR','datetime','bid', 'ask')
        # datetime (always 1st) and then the desired order for
        params = (
            ('PAIR', 0),  # inherited from parent class
            ('datetime', 1),  # inherited from parent class
            ('bid', 2),  # default field pos 1
            ('ask', 3)  # default field pos 2
        )
    

    The above won't work due to lines = ('PAIR','datetime','bid', 'ask') however replacing the section below instead is not returning an error message and so seems to be working fine:

    lines = ('datetime','bid', 'ask')
    

    I understand that the field 'PAIR' being a string is what is causing the error (code section below from -loadline), thus my questions:

    1. What is the underlying rationale ? Is it because "lines" should only refers to preset categories defined in BT (datetime, OHLV, volume, etc...)?
    2. What is the impact of simply not mentioning "PAIR" in the "lines" list (if any)?
            for linefield in (x for x in self.getlinealiases() if x != 'datetime'):
                # Get the index created from the passed params
                csvidx = getattr(self.params, linefield)
    
                if csvidx is None or csvidx < 0:
                    # the field will not be present, assignt the "nullvalue"
                    csvfield = self.p.nullvalue
                else:
                    # get it from the token
                    csvfield = linetokens[csvidx]
    
                if csvfield == '':
                    # if empty ... assign the "nullvalue"
                    csvfield = self.p.nullvalue
    
                # get the corresponding line reference and set the value
                line = getattr(self.lines, linefield)
                line[0] = float(float(csvfield))# Why is the expectation to ALWAYS have a float - can it be changed for more flexibility? (cg 'PAIR' filed in TruFX import - Error message
    
            return True