Navigation

    Backtrader Community

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

    Adding InfluxDB Datafeeds

    General Code/Help
    3
    14
    656
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      AlfredSisley last edited by

      Hi All,

      Recently started using InlfuxDB database as opposed to CSV files due to memory limitations.

      While those limitations do still exist, they are not as severe.

      I've been adding data feeds as shown below:

      symbols = get_sp500_tickers().head(150)
      
          for symbol in symbols:
              data = bt.feeds.InfluxDB(host='localhost', port='8086',
                                      username='root',
                                      password='root',
                                      database='test_datat',
                                      dataname=symbol,
                                      timeframe=bt.TimeFrame.Minutes,
                                      compression=1,
                                      startdate=datetime.datetime(2020, 1, 2),
                                      todate=datetime.datetime(2020, 5, 22),
                                      high='high',
                                      low='low',
                                      open='open',
                                      close='close',
                                      volume='volume',
                                      ointerest='openinterest')
          
      
          
              cerebro.adddata(data)
              print(symbol)
      
          cerebro.run(runonce=True,stdstats=False)
      

      Just wondering:

      1. Is this the optimal way of adding these kinds of feeds?
      2. Is there anything I can do to reduce the memory consumption?

      Thanks

      Alfred Sisley

      1 Reply Last reply Reply Quote 0
      • vladisld
        vladisld last edited by

        You may try to play with exactbars of cause - but this will only possibly save memory of the data stored in the LineBuffer objects. See here for more info.

        As for the data stored in the result set inside the InfluxDB feed objects - here it may became more involved. See my previous post about it here

        1 Reply Last reply Reply Quote 1
        • A
          AlfredSisley last edited by

          @vladisld Great post!

          • Very much enjoyed seeing graphic representation of memory usage. How would one do something similar? Was this done with a particular utility program?

          • In regards to your fix on github, how would I go about incorporating this into the code I'm currently using for InfluxDB? Is it as simple as adding the missing functions?
            def _preload(self):
            def preload(self):

          • You mentioned "Using Pool.imap chunksize=1 illustrate this ( specifying chunksize causes the Pool to use each worker process only for chunksize work items)". Would this look like cerebro.run(chunksize=1)?

          As always, immensely appreciate your insights.

          Alfred Sisley

          vladisld 1 Reply Last reply Reply Quote 0
          • vladisld
            vladisld @AlfredSisley last edited by

            @AlfredSisley said in Adding InfluxDB Datafeeds:

            How would one do something similar? Was this done with a particular utility program?

            see here

            @AlfredSisley said in Adding InfluxDB Datafeeds:

            Is it as simple as adding the missing functions?

            You may just replace the influxfeed.py.

            @AlfredSisley said in Adding InfluxDB Datafeeds:

            Would this look like cerebro.run(chunksize=1)?

            No, it is not a run method parameter. It is the multiprocessing.Pool.imap parameter.

            This is only relevant if your using optstrategy. It has no impact on backtesting.

            For this to work you need to modify the Cerebro engine. There you'll find a pretty intense code that require good understanding of pickling mechanism and multiprocessing package.

            Beside the change to the chunksize, the other problem is the way the Cerebro engine instance is pickle-ized to the working process. The fix there is much more involved. I have some dirty change in my fork - but is it definitely not at production quality level. see here - you may take it if you brave enough :-)

            1 Reply Last reply Reply Quote 1
            • A
              AlfredSisley last edited by

              Thanks @vladisld!

              • I've made the changes to influxfeed.py, believe there was an improvement on run time.

              • Memory Usage @ 5 tickers:
                memory-profile - 5 tickers.png

              • Memory Usage of 10 tickers:
                memory-profile - 10 tickers.png

              Is there anything noticeable in the charts above that would suggest issues with the code?

              Is there any way of unloading the historical data from memory at any point in the process and just maintaining the buy/sell decisions?

              At this point, is my only alternative to upgrade my machine?

              Much Appreciated!

              Alfred Sisley

              1 Reply Last reply Reply Quote 0
              • vladisld
                vladisld last edited by

                Have you experimented with exactbars, as I suggested earlier ?

                1 Reply Last reply Reply Quote 1
                • A
                  AlfredSisley last edited by

                  @vladisld I added cerebro.run(exactbars=1) and the program managed to run through 100 tickers, which is good news!

                  However, as you will see below, by capping my memory, it also increased the time required.

                  It took around 16 seconds per ticker (100 tickers).

                  Below is a before/after example of memory consumption using 5 tickers.

                  For 500 tickers, I would be looking at 8,000 seconds (or roughly 2 hours) to run a backtest.

                  At this point, what else can I do to materially speed things up?

                  5 Tickers Before:

                  memory-profile - 5 tickers (Before).png

                  5 Tickers After:

                  memory-profile - 5 tickers (After).png

                  Thanks

                  Alfred Sisley

                  1 Reply Last reply Reply Quote 0
                  • vladisld
                    vladisld last edited by

                    Did you try backtesting each ticker in parallel ( if it makes sense of cause ) ?

                    1 Reply Last reply Reply Quote 0
                    • A
                      AlfredSisley last edited by

                      @vladisld said in Adding InfluxDB Datafeeds:

                      s sense of cause

                      @vladisld If you mean running 1 ticker at a time, I did not, I was hoping to get results from a portfolio level.

                      Given the current limitations, I will try this.

                      At this point, short of changing the guts of cerebro, is there anything I can do on the software side to improve things? If not, would a better machine solve the problem?

                      Current laptop:

                      c2832ee4-7ca0-403c-a5e1-ed9f12e699b1-image.png

                      Thanks

                      Alfred Sisley

                      1 Reply Last reply Reply Quote 0
                      • vladisld
                        vladisld last edited by

                        Talking about HW specs makes it a completely different discussion of cause, however with all that said:

                        i7-6500U ( 4 cores hyperthreaded) + 12GB buys you a decent machine for development and basic backtesting probably. However you'll hit its limits pretty fast (if not already) if you start optimizing.

                        In my case, a similar machine is utilized for development but for optimizing a more capable server is used ( old dual Xeon E5-2690 v2 - 48 core + 256 GB + small NVMe drive ). Usually I'm running optimizations with up to 4096 configs - so something that will take months to run on my dev machine may take day or less on this server.

                        So the answer is yes - more capable, dedicated machine may improve things for heavy loads. And it is actually not that expensive - got my for less that 1500$ on eBay.

                        1 Reply Last reply Reply Quote 2
                        • A
                          AlfredSisley last edited by

                          @vladisld said in Adding InfluxDB Datafeeds:

                          pecs makes it a completely different discussion of cause, however with all that said:

                          @vladisld

                          I'll look into upgrading my machine.

                          Are you surprised by how long it takes me to run my simple skeleton backtests given my hardware?

                          Does it make you think there's something very wrong with my code?

                          Sounds like you have no problem backtesting with similar hardare specs, but I can't seem to get anywhere.

                          Appreciate your thoughts.

                          Alfred Sisley

                          1 Reply Last reply Reply Quote 0
                          • vladisld
                            vladisld last edited by

                            @AlfredSisley said in Adding InfluxDB Datafeeds:

                            Are you surprised by how long it takes me to run my simple skeleton backtests given my hardware?

                            Hard to tell, can't directly compare with my local results. Usually I'm backtesting a single ticker at a time, so my mileage may vary.

                            What's your timing for backtesting a single ticker for, say 5 - 10 years of 1 or 5 minutes bars ?

                            1 Reply Last reply Reply Quote 0
                            • L
                              luzzalgos last edited by

                              Hi @vladisld,
                              I have to say “thank you” because of your posts I started using InfluxDB to store my data. I spent some time to figure out how to load filtered data from s3 aws buckets to influx but finally solved that. What still is not clear for me and what relates to this topic is how to feed correctly data into backtrader. I tried with build-in influxfeedpy but there are few mistakes i.e. open price calculated from “open price mean”.
                              Would you be so kind and share good practices in working with influxfeed and influxfeed code you are using? That would help me a lot.
                              Thank you in advance

                              vladisld 1 Reply Last reply Reply Quote 1
                              • vladisld
                                vladisld @luzzalgos last edited by

                                @luzzalgos said in Adding InfluxDB Datafeeds:

                                open price calculated from “open price mean”

                                I'm not sure it is a problem once the requested timeframe/compression matches the timeframe and compression stored in the database.

                                I do have a slightly modified version of influxfeed.py in my fork that doesn't use 'mean' for calculating the open price (only from performance perspective, although still using grouping) and includes more fixes to support proper 'preload' functionality and reduced memory print:
                                https://github.com/vladisld/backtrader/blob/lazytrader/backtrader/feeds/influxfeed.py

                                As for good practices - don't have much - it works pretty well as it is. As with all databases there are many ways of performance tuning in the InfluxDB database itself - starting with engine parameters, indexing, proper tags selection, hardware upgrade (in case you are serious about your data) and so on - however this is a far bigger topic and is not suitable for a simple post (just google 'influxdb performance tuning' - there are a lot of resources and tutorials)

                                1 Reply Last reply Reply Quote 1
                                • 1 / 1
                                • First post
                                  Last post
                                Copyright © 2016, 2017, 2018, 2019, 2020, 2021 NodeBB Forums | Contributors