For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/
2019-10-02: The community is currently in read-only mode

Resampling issue, duplicated records.



  • Re: Resampling Fixes for 1.9.67.122

    Hello all,

    I got BTC/USD minutely data. I'm trying to resample it by doing the following but I'm getting duplicated records added. This is skewing may strategy. In my strategy, on next() I only added a print statement and this is what I'm getting:

    2019-07-27T23:40:00, O: 9440.25 H: 9446.31 L: 9415.72 C: 9415.72
    2019-07-27T23:50:00, O: 9415.72 H: 9440.71 L: 9412.5 C: 9437.9
    2019-07-27T23:50:00, O: 9415.72 H: 9440.71 L: 9412.5 C: 9437.9

    2019-07-28T00:00:00, O: 9437.9 H: 9540.0 L: 9422.55 C: 9468.92
    ...
    ...
    2019-08-04T23:40:00 - O: 10975.0 H: 10995.0 L: 10970.0 C: 10990.0
    2019-08-04T23:50:00 - O: 10990.0 H: 11018.36 L: 10988.91 C: 11013.83
    2019-08-04T23:50:00 - O: 10990.0 H: 11018.36 L: 10988.91 C: 11013.83

    2019-08-05T00:00:00 - O: 11013.83 H: 11018.36 L: 10977.5 C: 10997.82

    This is how I'm resampling and adding the data to cerebro.

        self.data0 = SQLiteFeed( 
            timeframe=bt.TimeFrame.Minutes,  // This feed contains minutely data that I want to resample as 10 mins data.
            compression=1,
            dtformat=self.dtformat,
            fromdate=datetime.datetime(year=2019, month=5, day=1, hour=0, minute=0, tzinfo=pytz.utc),
            dataname=self.exchange.pair,
            interval="Minutely",
            exchange=self.exchange,
        )
    
        self.data1 = SQLiteFeed(
            timeframe=bt.TimeFrame.Minutes,    // This feed contains hourly data that I want to resample as weekly data.
            compression=60,
            dtformat=self.dtformat,
            fromdate=datetime.datetime(year=2019, month=5, day=1, hour=0, minute=0, tzinfo=pytz.utc),
            dataname=self.exchange.pair,
            interval="Hourly",
            exchange=self.exchange,
        )
    
        resample1 = self.cerebro.resampledata(self.data0, timeframe=bt.TimeFrame.Minutes, compression=10)
        resample2 = self.cerebro.resampledata(self.data1, timeframe=bt.TimeFrame.Days, compression=8)
    

    Yes, I created my own SQLiteFeed and have been using it for a while now without any issues. I verified the data and it does not seem to have any issues.


  • administrators



  • Hi,

    I read that post. I'm not sure I understand how I can fix the issue I'm experiencing. That post, explicitly mentions that there's an issue when trying to use data feeds when there is a date discrepancy between the two.

    I tried using the same data source so I updated my resample2 to use the same feed as resample1 (self.data0)

    But, I still got the same problem. Not on the same datetime though. Wondering what could cause this issue.

    2019-07-24T23:40:00, O: 9783.57 H: 9809.99 L: 9775.0 C: 9809.99
    2019-07-24T23:50:00, O: 9809.99 H: 9838.49 L: 9806.01 C: 9811.47
    2019-07-24T23:50:00, O: 9809.99 H: 9838.49 L: 9806.01 C: 9811.47
    2019-07-25T00:00:00, O: 9811.47 H: 9811.47 L: 9749.3 C: 9751.24


  • administrators

    1. It's not an issue and the article doesn't say so. It says that people tend to believe it is an issue but it is lack of understanding.

    2. You are NOT getting duplicate records. But you fail to show what you print, so it is virtually impossible to say what you are printing.

    3. When a data feed moves past other and the "other" has no new records only the existing price can be shown.

    4. Data feeds (and mostly everything) has a len to indicate if something new has been added. Check the len.



  • Just to bring clarity here's what I'm printing on my strategy.

    def log(self, txt, dt=None):
            """ Logging function fot this strategy"""
            dt1 = self.data0.datetime.datetime(0)
            print("%s, %s" % (dt1.isoformat(), txt))
    
    def next(self):
        self.log(
             "O: " + str(self.dataOpenT0[0]) +
             " H: " + str(self.dataHighT0[0]) +
             " L: " + str(self.dataLowT0[0]) +
             " C: " + str(self.dataCloseT0[0])
         )
    

    Do you mean that there's an issue with my data and that's why I'm seeing what I'm seeing ? Even if I resample with the same data with another compression value ?
    How can I spot where the data issue is ?



  • @Mexflubber said in Resampling issue, duplicated records.:

    How can I spot where the data issue is ?

    Check the length of the data line. For datas[0] for example insert...

    len(self.datas[0])
    

    into the next function in the same line you are printing your information. By checking the length of the data line you can check to see if the data is simply being displayed again or you have new data. If length is the same for your duplicated data, you are just seeing the same data twice and have no duplicated data.


Log in to reply
 

});