Real Time data processing and storage:



  • Hi all,

    I am just asking this question in general since I can't find a satisfactory answer on google. I am getting real time tick data (last trade price) for the markets I trade. What's the best way to process and store this data? I want to capture the data, resample it into 1 minute bars, calculate signals on the data and store it. Are there any modules that I can use along with Pandas to do this easily or is this a much bigger project than I think? I did find one asnwer to this on stackoverflow, but I don't think it's entirely correct.

    thanks,

    https://stackoverflow.com/questions/31141292/real-time-data-processing-with-pandas#


  • administrators

    pandas can help you with the resampling, but DataFrames are not meant to be extended and doing it is a costly operation (which basically involves allocating memory and copying the previous data) The best approach would be to use a list or an overdimensioned pre-allocated dataframe, keeping track of how many ticks have been stored for the current period.

    For storage you can look into Artic: https://github.com/manahl/arctic

    But the real problem is not whether pandas can help you or if it's the best tool: real-time is the problem.

    Example:

    • You get ticks starting with a timestamp of 00:00:01.001 until 00:00:01.798
    • There is now no negotiation until 00:05:00.001, i.e.: almost 5 minutes later

    The questions:

    • How do you know that 00:00:01.798 is the last tick for the 1st minute?
    • Are you prepared to recognize that almost 5 minutes later? Because you want to generate signals based on the 1-minute resampled bar and the signal is given to you 5 minutes too late.

    In the case of backtrader and when receiving ticks, the real-time clock is used to decide if a resampled bar has to be delivered for the chosen timeframe (1-minute in your case) even if no new tick has been seen. The logic

    • You get ticks starting with a timestamp of 00:00:01.001 until 00:00:01.798
    • When the real-time clock signals 00:01:00.000 (plus some time to allow for late ticks) and even in the absence of ticks, the resampled bar is delivered.
    • A new resampling starts when the next tick is seen at 00:05:00.001

    You may of course consider the example to be far fetched, but it does actually happen and it tends to happen more often when working with the smaller timeframes.

    Should you code it yourself, take it into account.



  • @backtrader Thanks for the summary. As I suspected, there is much more to it, I'll have to think about it. There are just too many "unknow unknows".


Log in to reply
 

Looks like your connection to Backtrader Community was lost, please wait while we try to reconnect.