For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Adding InfluxDB Datafeeds



  • Hi All,

    Recently started using InlfuxDB database as opposed to CSV files due to memory limitations.

    While those limitations do still exist, they are not as severe.

    I've been adding data feeds as shown below:

    symbols = get_sp500_tickers().head(150)
    
        for symbol in symbols:
            data = bt.feeds.InfluxDB(host='localhost', port='8086',
                                    username='root',
                                    password='root',
                                    database='test_datat',
                                    dataname=symbol,
                                    timeframe=bt.TimeFrame.Minutes,
                                    compression=1,
                                    startdate=datetime.datetime(2020, 1, 2),
                                    todate=datetime.datetime(2020, 5, 22),
                                    high='high',
                                    low='low',
                                    open='open',
                                    close='close',
                                    volume='volume',
                                    ointerest='openinterest')
        
    
        
            cerebro.adddata(data)
            print(symbol)
    
        cerebro.run(runonce=True,stdstats=False)
    

    Just wondering:

    1. Is this the optimal way of adding these kinds of feeds?
    2. Is there anything I can do to reduce the memory consumption?

    Thanks

    Alfred Sisley



  • You may try to play with exactbars of cause - but this will only possibly save memory of the data stored in the LineBuffer objects. See here for more info.

    As for the data stored in the result set inside the InfluxDB feed objects - here it may became more involved. See my previous post about it here



  • @vladisld Great post!

    • Very much enjoyed seeing graphic representation of memory usage. How would one do something similar? Was this done with a particular utility program?

    • In regards to your fix on github, how would I go about incorporating this into the code I'm currently using for InfluxDB? Is it as simple as adding the missing functions?
      def _preload(self):
      def preload(self):

    • You mentioned "Using Pool.imap chunksize=1 illustrate this ( specifying chunksize causes the Pool to use each worker process only for chunksize work items)". Would this look like cerebro.run(chunksize=1)?

    As always, immensely appreciate your insights.

    Alfred Sisley



  • @AlfredSisley said in Adding InfluxDB Datafeeds:

    How would one do something similar? Was this done with a particular utility program?

    see here

    @AlfredSisley said in Adding InfluxDB Datafeeds:

    Is it as simple as adding the missing functions?

    You may just replace the influxfeed.py.

    @AlfredSisley said in Adding InfluxDB Datafeeds:

    Would this look like cerebro.run(chunksize=1)?

    No, it is not a run method parameter. It is the multiprocessing.Pool.imap parameter.

    This is only relevant if your using optstrategy. It has no impact on backtesting.

    For this to work you need to modify the Cerebro engine. There you'll find a pretty intense code that require good understanding of pickling mechanism and multiprocessing package.

    Beside the change to the chunksize, the other problem is the way the Cerebro engine instance is pickle-ized to the working process. The fix there is much more involved. I have some dirty change in my fork - but is it definitely not at production quality level. see here - you may take it if you brave enough :-)



  • Thanks @vladisld!

    • I've made the changes to influxfeed.py, believe there was an improvement on run time.

    • Memory Usage @ 5 tickers:
      memory-profile - 5 tickers.png

    • Memory Usage of 10 tickers:
      memory-profile - 10 tickers.png

    Is there anything noticeable in the charts above that would suggest issues with the code?

    Is there any way of unloading the historical data from memory at any point in the process and just maintaining the buy/sell decisions?

    At this point, is my only alternative to upgrade my machine?

    Much Appreciated!

    Alfred Sisley



  • Have you experimented with exactbars, as I suggested earlier ?



  • @vladisld I added cerebro.run(exactbars=1) and the program managed to run through 100 tickers, which is good news!

    However, as you will see below, by capping my memory, it also increased the time required.

    It took around 16 seconds per ticker (100 tickers).

    Below is a before/after example of memory consumption using 5 tickers.

    For 500 tickers, I would be looking at 8,000 seconds (or roughly 2 hours) to run a backtest.

    At this point, what else can I do to materially speed things up?

    5 Tickers Before:

    memory-profile - 5 tickers (Before).png

    5 Tickers After:

    memory-profile - 5 tickers (After).png

    Thanks

    Alfred Sisley



  • Did you try backtesting each ticker in parallel ( if it makes sense of cause ) ?



  • @vladisld said in Adding InfluxDB Datafeeds:

    s sense of cause

    @vladisld If you mean running 1 ticker at a time, I did not, I was hoping to get results from a portfolio level.

    Given the current limitations, I will try this.

    At this point, short of changing the guts of cerebro, is there anything I can do on the software side to improve things? If not, would a better machine solve the problem?

    Current laptop:

    c2832ee4-7ca0-403c-a5e1-ed9f12e699b1-image.png

    Thanks

    Alfred Sisley



  • Talking about HW specs makes it a completely different discussion of cause, however with all that said:

    i7-6500U ( 4 cores hyperthreaded) + 12GB buys you a decent machine for development and basic backtesting probably. However you'll hit its limits pretty fast (if not already) if you start optimizing.

    In my case, a similar machine is utilized for development but for optimizing a more capable server is used ( old dual Xeon E5-2690 v2 - 48 core + 256 GB + small NVMe drive ). Usually I'm running optimizations with up to 4096 configs - so something that will take months to run on my dev machine may take day or less on this server.

    So the answer is yes - more capable, dedicated machine may improve things for heavy loads. And it is actually not that expensive - got my for less that 1500$ on eBay.



  • @vladisld said in Adding InfluxDB Datafeeds:

    pecs makes it a completely different discussion of cause, however with all that said:

    @vladisld

    I'll look into upgrading my machine.

    Are you surprised by how long it takes me to run my simple skeleton backtests given my hardware?

    Does it make you think there's something very wrong with my code?

    Sounds like you have no problem backtesting with similar hardare specs, but I can't seem to get anywhere.

    Appreciate your thoughts.

    Alfred Sisley



  • @AlfredSisley said in Adding InfluxDB Datafeeds:

    Are you surprised by how long it takes me to run my simple skeleton backtests given my hardware?

    Hard to tell, can't directly compare with my local results. Usually I'm backtesting a single ticker at a time, so my mileage may vary.

    What's your timing for backtesting a single ticker for, say 5 - 10 years of 1 or 5 minutes bars ?


Log in to reply
 

});