Real Time data processing and storage:
-
Hi all,
I am just asking this question in general since I can't find a satisfactory answer on google. I am getting real time tick data (last trade price) for the markets I trade. What's the best way to process and store this data? I want to capture the data, resample it into 1 minute bars, calculate signals on the data and store it. Are there any modules that I can use along with Pandas to do this easily or is this a much bigger project than I think? I did find one asnwer to this on stackoverflow, but I don't think it's entirely correct.
thanks,
https://stackoverflow.com/questions/31141292/real-time-data-processing-with-pandas#
-
pandas
can help you with the resampling, butDataFrames
are not meant to be extended and doing it is a costly operation (which basically involves allocating memory and copying the previous data) The best approach would be to use alist
or an overdimensioned pre-allocated dataframe, keeping track of how many ticks have been stored for the current period.For storage you can look into Artic: https://github.com/manahl/arctic
But the real problem is not whether
pandas
can help you or if it's the best tool: real-time is the problem.Example:
- You get ticks starting with a timestamp of
00:00:01.001
until00:00:01.798
- There is now no negotiation until
00:05:00.001
, i.e.: almost 5 minutes later
The questions:
- How do you know that
00:00:01.798
is the last tick for the 1st minute? - Are you prepared to recognize that almost 5 minutes later? Because you want to generate signals based on the 1-minute resampled bar and the signal is given to you 5 minutes too late.
In the case of backtrader and when receiving ticks, the real-time clock is used to decide if a resampled bar has to be delivered for the chosen timeframe (1-minute in your case) even if no new tick has been seen. The logic
- You get ticks starting with a timestamp of
00:00:01.001
until00:00:01.798
- When the real-time clock signals
00:01:00.000
(plus some time to allow for late ticks) and even in the absence of ticks, the resampled bar is delivered. - A new resampling starts when the next tick is seen at
00:05:00.001
You may of course consider the example to be far fetched, but it does actually happen and it tends to happen more often when working with the smaller timeframes.
Should you code it yourself, take it into account.
- You get ticks starting with a timestamp of
-
@backtrader Thanks for the summary. As I suspected, there is much more to it, I'll have to think about it. There are just too many "unknow unknows".