For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Indicator using multi datafeed: operate only on overlapping bar



  • I've created an indicator that take 2 datafeeds and return a line of the ratio between them.
    the issue i'm straggling with is to add a condition that will only calculate the ratio if data is coming from the same time. by same time I mean overlapping bars.

    I'm trying to study correlations between 2 sets of data.
    with one of them having very low volume making a lot of 1 minute timeframe bars be empty of data.

    if my understanding is correct the system will keep reference to the last known value and use it.
    but for my case if the data doesn't overlap in time I would like to skip it as it not really relevant (cause I will get a ratio that never actually happened)

    [side note: because the code operate on lines and is only called once in init I can't see the actual calculation while debugging... thoughts on this are also welcome]

    this is my relevant code so far from the indicator init:

    def init(self, *args, **kwargs):

        if 'datas_name' not in kwargs:
            raise bt.errors.BacktraderError()
    
        self.datasByName = kwargs['datas_name']
        self.firstData = self.datasByName['1']
        self.secondData = self.datasByName['2']
    
        # need a condition that will be valid only if bars are overlaping in period
        # something like this... if(self.firstData .lines.datetime== self.secondData .lines.datetime):
            self.l.ratio= self.secondData .lines.close / self.firstData .lines.close

  • administrators

    When a data feed doesn't move forward its len doesn't change. That's the indication you need.



  • @backtrader I experimented a bit using '...lines.datetime.lencount' (not sure if that what you meant).
    and I got some questions regarding the behavior I witness.

    I use 2 sets of data, first one gave count of about 180,000 and the other around 60,000 for the same period of time. still need to check that indeed the bar are in sync with each other.

    but what i found interesting and left me a bit confused is when I used 'resample' on the data instead of just the raw data. now both data sets returned count of about 1600 (not the same) and it was regardless of time frame or compression params that I've tried

    cerebro.resampledata(data, name='main', timeframe=bt.TimeFrame.Seconds, compression=180)

    not sure what it says about the bars being synced? why and how count remain the same? why the count is so low?


  • administrators

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    @backtrader I experimented a bit using '...lines.datetime.lencount' (not sure if that what you meant).
    and I got some questions regarding the behavior I witness

    No, len is the built-in function, len. As in: len_of_my_data = len(mydata), which can of course be applied to an individual line also, but all lines in a data feed have the same len.

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    not sure what it says about the bars being synced? why and how count remain the same? why the count is so low?

    It seems a scenario which would need some code to see how it works and make some assumptions about your case



  • As i'm still learning the system the code I used is based on the Quickstart example and the modifications seem very trivial.

    I striped away as much as I could to focus on the issue:
    when using adddata => count 180000
    when using resample => count 1600

    in main:

    if __name__ == '__main__':    
    file = 'some/file.csv'
    
    cerebro = bt.Cerebro()
    
    # Add a strategy
    cerebro.addstrategy(TestStrategy)
    
    # add data from custom csv file
    data = MyHLOC(dataname=file)
    
    # Add the resample data instead of the original
    # cerebro.adddata(data, name='main')
    cerebro.resampledata(data, name='main', timeframe=bt.TimeFrame.Minutes, compression=60)
    
    cerebro.run()
    

    in strategy:

    class TestStrategy(bt.Strategy):
    ...
    ...
    def __init__(self):
    ...
    ...
        self.testIndicator = IndicatorTest(datas=self.dnames)
    

    in Indicator:

    class IndicatorTest(bt.Indicator):
    
    def __init__(self, *args, **kwargs):
        if 'datas' not in kwargs:
            raise bt.errors.BacktraderError()
    
        self.datasByName = kwargs['datas']
        self.mainData = self.datasByName['main']
    
    def next(self, *args, **kwargs):
        print(self.mainData.lines.datetime.lencount)

  • administrators

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    def next(self, *args, **kwargs):
        print(self.mainData.lines.datetime.lencount)
    

    You should use the builtin len: len(self.mainData)

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    when using adddata => count 180000
    when using resample => count 1600

    It seemed a lot more complex with loads of text and it now seems a simple issue, so the truth lies probably in the middle, but:

    • When you resample something you are bound to get less bars, because you combine several bars into one, i.e.: 60 1-minute bars into 1 60-minutes bar.

      As such, the len cannot be the same.



  • of course, I understand the idea of time frames and compression.
    but as I wrote in the original post:
    "now both data sets returned count of about 1600 (not the same) and it was regardless of time frame or compression params that I've tried"

    I have change different time frames with different compression but the count remain the same.
    this is the part that I don't understand why



  • can it be a bug in the system?



  • @Jacob : did you get this to work? I have the exact same question.

    For example (see picture) I have 1 data feed that has hourly bars for hour 7-8 and hour 8-9 and another data feed that has data for hour 8-9 and hours 9-...-14
    ![alt text](0_1530183735252_eec93c2d-94ce-4719-9cfc-ebad55dcb266-image.png image url)

    I resampled everything to Minutes (with compression 1) - I have to be honest, I still don't fully understand the resampling, which is why I am looking at the output.

    The ratio gets calculated from hour 8 till hour 14...

    So my questions are

    • what is the logic/rules to pick ? I assume it sits there and waits for both feeds to have a bar before it does anything. And then after the second feed drops off, it seems the system thinks it is still present. I can understand the first bit of the logic, but not the second...
    • Can you force it to know that if there is no bar, there is no data: I was outputting for every bar, the value of both legs and it gives me this:
    2018-05-31T08:00:00, Close Leg1, 19.09
    2018-05-31T08:00:00, Close Leg2, 20.49
    2018-05-31T09:00:00, Close Leg1, 18.96
    2018-05-31T09:00:00, Close Leg2, 20.49
    2018-05-31T10:00:00, Close Leg1, 18.96
    2018-05-31T10:00:00, Close Leg2, 20.49
    2018-05-31T11:00:00, Close Leg1, 18.94
    2018-05-31T11:00:00, Close Leg2, 20.49
    2018-05-31T12:00:00, Close Leg1, 18.96
    2018-05-31T12:00:00, Close Leg2, 20.49
    2018-05-31T13:00:00, Close Leg1, 18.87
    2018-05-31T13:00:00, Close Leg2, 20.49
    2018-05-31T14:00:00, Close Leg1, 18.77
    2018-05-31T14:00:00, Close Leg2, 20.49
    


  • @crystalet said in Indicator using multi datafeed: operate only on overlapping bar:

    @Jacob : did you get this to work? I have the exact same question.

    issue is still unresolved for when you use resample.

    For example (see picture) I have 1 data feed that has hourly bars for hour 7-8 and hour 8-9 and another data feed that has data for hour 8-9 and hours 9-...-14
    I resampled everything to Minutes (with compression 1) - I have to be honest, I still don't fully understand the resampling, which is why I am looking at the output.

    resampling can take a high resolution data (like minutes) and construct a lower resolution data (like hours) because all the data is present. but you can't do it the other way around

    So my questions are

    what is the logic/rules to pick ? I assume it sits there and waits for both feeds to have a bar before it does anything. And then after the second feed drops off, it seems the system thinks it is still present. I can understand the first bit of the logic, but not the second...

    Can you force it to know that if there is no bar, there is no data: I was outputting for every bar

    the default behavior is to use the last known data which make sense for most cases, simply because 2 different data feeds will not be in sync all the time (if you think on low volatility markets, not every minute will you have a trade)

    on the case where it does matter @backtrader already answered that you should use the 'len' function to determine if the length of the current view of the data had changed. in simple words if you are still looking at the old bar data the length of the data will not change, cause no new data was added.

    problem still exist because this solution works well on data that is 'added' but for some reason not on 'resampled' data


  • administrators

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    of course, I understand the idea of time frames and compression.
    but as I wrote in the original post:
    "now both data sets returned count of about 1600 (not the same) and it was regardless of time frame or compression params that I've tried"
    I have change different time frames with different compression but the count remain the same.
    this is the part that I don't understand why

    It is still unclear what you are doing. You posted a code snippet in which you simply resample and added

    I striped away as much as I could to focus on the issue:

    when using adddata => count 180000
    when using resample => count 1600

    Which obviously shows that resampling is working because it reduces the numbers of bars.

    It would be meaningful if you show:

    • An actual code sample in which you use the 2 data feeds simultaneously
    • You stop using the attribute lencount which is an internal structure and use len as stated above
    • You print the actual values from your snippet run which apparently are a problem for you.
    • You for example indicate what's the actual initial timeframe of the data

    Those things could help.


  • administrators

    @crystalet said in Indicator using multi datafeed: operate only on overlapping bar:

    For example (see picture) I have 1 data feed that has hourly bars for hour 7-8 and hour 8-9 and another data feed that has data for hour 8-9 and hours 9-...-14

    @crystalet said in Indicator using multi datafeed: operate only on overlapping bar:

    I resampled everything to Minutes (with compression 1) - I have to be honest, I still don't fully understand the resampling, which is why I am looking at the output.

    Tha won't work as explained by @Jacob.

    resampling as the concept is know is actually upsampling. Downsampling is simply impossible.



  • @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    resampling can take a high resolution data (like minutes) and construct a lower resolution data (like hours) because all the data is present. but you can't do it the other way around

    @backtrader said in Indicator using multi datafeed: operate only on overlapping bar:

    resampling as the concept is know is actually upsampling. Downsampling is simply impossible.

    I'm I missing something? aren't we saying the same thing?

    @backtrader said in Indicator using multi datafeed: operate only on overlapping bar:

    It would be meaningful if you show:

    An actual code sample in which you use the 2 data feeds simultaneously
    You stop using the attribute lencount which is an internal structure and use len as stated above
    You print the actual values from your snippet run which apparently are a problem for you.
    You for example indicate what's the actual initial timeframe of the data

    Those things could help.

    OK I've change the code according to your points and retested everything.
    I used my 2 data sets of 1 min timeframe (6 months of data). 'first' one have a lot of data and the 'second' have a lot of gaps (low volume)
    Those are my test results for checking the 'len' (next post i will put the code I ran and a few screenshots)

    • normal added data: 'first' ~240000, 'second' 0, plot works as expected
    • resampled 1 week: 'first' ~25, 'second' 25, plot works as expected
    • resampled 1 day: 'first' ~240000, 'second' ~20000, plot have too many details
    • resampled 10 day: same as 1 day
    • resampled 1 min: 'first' 170, 'second' 170, plot have too few details
    • resampled 60 min: same as 1 min
    • resampled 180 second: same as 1 min

    from the results we can see there are a few thing that doesn't add up:

    • on added data the 'len' of the second data is 0
    • I would expect to see the results from 'day' and 'min' to be switched
    • I would expect that changing the compression would change the 'len'


  • import backtrader as bt
    import datetime
    
    class Ratio(bt.Indicator):
        lines = ('ratio', )
        params = dict(period = 20)
    
    def __init__(self, *args, **kwargs):
        if 'datas' not in kwargs:
            raise bt.errors.BacktraderError()
    
        self.addminperiod(self.p.period)
    
        self.datasByName = kwargs['datas']
    
        self.first = self.datasByName['first']
        self.second = self.datasByName['second']
    
    def next(self, *args, **kwargs):
        ratio = self.second.lines.close / self.first.lines.close
    
        print('First: {}'.format(len(self.first)))
        print('Second: {}'.format(len(self.second)))
    
        self.l.ratio[0] = ratio
    
    class MyHLOC(bt.feeds.GenericCSVData):
    params = (
        ('fromdate', datetime.datetime(2018, 1, 1)),
        ('todate', datetime.datetime(3000, 1, 1)),
        ('nullvalue', 0.0),
        ('dtformat', 1),  # 1 give you the option to work directly with timestamp
    
        ('datetime', 0),
        ('open', 1),
        ('high', 2),
        ('low', 3),
        ('close', 4),
        ('volume', 5)
    )
    
    
    # Create a Strategy
    class TestStrategy(bt.Strategy):
    def __init__(self):
        self.testIndicator = Ratio(datas = self.dnames)
    
    def next(self):
        pass
    
    
    if __name__ == '__main__':
    csv_file = ['first.csv', 'second.csv']
    
    cerebro = bt.Cerebro()
    
    # Add a strategy
    cerebro.addstrategy(TestStrategy)
    
    dataFirst = MyHLOC(dataname = csv_file[0])
    dataSecond = MyHLOC(dataname = csv_file[1])
    
    # add data
    #cerebro.adddata(dataFirst, name='first')
    #cerebro.adddata(dataSecond, name='second')
    
    cerebro.resampledata(dataFirst, name = 'first', timeframe = bt.TimeFrame.Weeks, compression = 1)
    cerebro.resampledata(dataSecond, name = 'second', timeframe = bt.TimeFrame.Weeks, compression = 1)
    
    # Set our desired cash start
    cerebro.broker.setcash(100000.0)
    
    cerebro.run()
    
    # Plot the result
    cerebro.plot()


  • added data and resample day look the same
    0_1530296757147_88b877c4-7675-4897-966f-ff05e8fd1c9a-image.png

    resample week
    0_1530296856559_789d3cb7-2ccf-40c1-90fb-d729b6684305-image.png

    resample min and resample sec look the same
    0_1530296889175_b4ec29e5-8722-4fe3-83f9-4e90e1b0bc1a-image.png


  • administrators

    There is not much information in the post except for the timeframe/compression in between the parenthesis which is as follows in the 3 charts

    • Day/1 and Day/1 (written as 1 Day in the chart which is obviously better suited to humans)
    • Week/1 and Week 1
    • Minute/1 and Minute/1

    Why should things look different?

    There is still no indication as to what you try to achieve or where your actual problem is.



  • @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    OK I've change the code according to your points and retested everything.
    I used my 2 data sets of 1 min timeframe (6 months of data). 'first' one have a lot of data and the 'second' have a lot of gaps (low volume)
    Those are my test results for checking the 'len' (next post i will put the code I ran and a few screenshots)

    • normal added data: 'first' ~240000, 'second' 0, plot works as expected
    • resampled 1 week: 'first' ~25, 'second' 25, plot works as expected
    • resampled 1 day: 'first' ~240000, 'second' ~20000, plot have too many details
    • resampled 10 day: same as 1 day
    • resampled 1 min: 'first' 170, 'second' 170, plot have too few details
    • resampled 60 min: same as 1 min
    • resampled 180 second: same as 1 min

    from the results we can see there are a few thing that doesn't add up:

    • on added data the 'len' of the second data is 0
    • I would expect to see the results from 'day' and 'min' to be switched
    • I would expect that changing the compression would change the 'len'

    I gave you all the analysis above, which clearly show that there is a bug or at the very least unclear behavior.

    @backtrader said in Indicator using multi datafeed: operate only on overlapping bar:

    There is not much information in the post except for the timeframe/compression in between the parenthesis which is as follows in the 3 charts

    • Day/1 and Day/1 (written as 1 Day in the chart which is obviously better suited to humans)
    • Week/1 and Week 1
    • Minute/1 and Minute/1

    Why should things look different?

    There is still no indication as to what you try to achieve or where your actual problem is.

    even if you looked only at the chart without reading my findings, please tell me the logic of how the chart of 1 min shows a lot less info than 1 day??? make absolutely no sense



  • @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    dataFirst = MyHLOC(dataname = csv_file[0])
    dataSecond = MyHLOC(dataname = csv_file[1])

    You may want to specify timeframe and compression for your original data. Looks like the system thinks that these are daily bars (default timeframe). Also it might be useful to specify sessionstart and sessionend parameters.

    From the FAQ

    data = bt.feeds.MyChosenFeed(
        dataname='myname',
        timeframe=bt.TimeFrame.Minutes, compression=5,
        sessionstart=datetime.time(9, 0), sessionend=datetime.time(16, 0)
    )
    


  • @ab_trader thanks!!! this is a great workaround for this issue/bug.

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    from the results we can see there are a few thing that doesn't add up:

    • on added data the 'len' of the second data is 0
    • I would expect to see the results from 'day' and 'min' to be switched
    • I would expect that changing the compression would change the 'len'

    resampling now works great!
    data len is correct for both the timeframe and for compression.

    on normal 'added data' issue still exist, but I don't mind just using 'resampling'.

    this is 10 minute resampled:
    0_1530953786087_0d98ae3b-c727-419e-904c-026e0301a55a-image.png

    this is 10 minute added data:
    0_1530953835317_f38b23d2-b5e8-4088-8fd3-17be3e22cc4a-image.png


  • administrators

    @jacob said in Indicator using multi datafeed: operate only on overlapping bar:

    this is a great workaround for this issue/bug.

    This is not a workaround. It is clearly stated in the FAQ: Community - FAQ

    The system cannot know which timeframe/compression your data feed has unless you tell the system.