Bug using Pandas hdf?
-
Hi, using the standard example for Pandas, but using hdf files, I'm getting an error and I'm not sure if its due to a problem in the data or an error in Backtrader itself:
File "b2.py", line 69, in <module> runstrat() File "b2.py", line 46, in runstrat cerebro.run() File "/Library/Python/2.7/site-packages/backtrader/cerebro.py", line 1127, in run runstrat = self.runstrategies(iterstrat) File "/Library/Python/2.7/site-packages/backtrader/cerebro.py", line 1209, in runstrategies data.preload() File "/Library/Python/2.7/site-packages/backtrader/feed.py", line 435, in preload while self.load(): File "/Library/Python/2.7/site-packages/backtrader/feed.py", line 476, in load _loadret = self._load() File "/Library/Python/2.7/site-packages/backtrader/feeds/pandafeed.py", line 262, in _load dt = tstamp.to_pydatetime() AttributeError: 'numpy.int64' object has no attribute 'to_pydatetime'
It crashes in Cerebro.run()
This is the code (basically, copied from example, but using hdf instead of csv). Here goes also the pandas data, just in case it has some error: https://drive.google.com/open?id=0ByTv1orrPKKPYkR2d1NYRHVhSUU
from __future__ import (absolute_import, division, print_function, unicode_literals) import argparse import backtrader as bt import backtrader.feeds as btfeeds import pandas def runstrat(): args = parse_args() # Create a cerebro entity cerebro = bt.Cerebro(stdstats=False) # Add a strategy cerebro.addstrategy(bt.Strategy) # Get a pandas dataframe datapath = ('datos1.pand') # Simulate the header row isn't there if noheaders requested skiprows = 1 if args.noheaders else 0 header = None if args.noheaders else 0 dataframe = pandas.read_hdf(datapath,"table", skiprows=skiprows, header=header, parse_dates=True, index_col=0) if not args.noprint: print('--------------------------------------------------') print(dataframe) print('--------------------------------------------------') # Pass it to the backtrader datafeed and add it to the cerebro data = bt.feeds.PandasData(dataname=dataframe) cerebro.adddata(data) # Run over everything cerebro.run() # Plot the result cerebro.plot(style='bar') def parse_args(): parser = argparse.ArgumentParser( description='Pandas test script') parser.add_argument('--noheaders', action='store_true', default=False, required=False, help='Do not use header rows') parser.add_argument('--noprint', action='store_true', default=False, help='Print the dataframe') return parser.parse_args() if __name__ == '__main__': runstrat()
EDIT: I think the data is correct because it loads correctly when I do print(dataframe):
-------------------------------------------------- Timestamp Open High Low Close Volume 0 1483228800 0.008335 0.008377 0.008332 0.008368 22.678130 1 1483229100 0.008379 0.008412 0.008373 0.008400 23.101541 2 1483229400 0.008400 0.008463 0.008388 0.008450 42.334102 ... ... ... ... ... ... ... 8927 1485906900 0.011050 0.011067 0.011035 0.011046 4.529571 8928 1485907200 0.011046 0.011091 0.011036 0.011090 9.032207 [8929 rows x 6 columns] --------------------------------------------------
EDIT2: Ok, I found how to fix this. I don't fully understand why this is needed but now it works. It is needed to transform the timestamp into a python Timestamp object:
def to_datetimeindex(unixtimestamp): t=datetime.datetime.fromtimestamp(unixtimestamp) return t.strftime("%Y-%m-%d %H:%M:%S") def to_Timestamp (stringtime): return pandas.Timestamp(stringtime) def runstrat(): args = parse_args() # Create a cerebro entity cerebro = bt.Cerebro(stdstats=False) # Add a strategy cerebro.addstrategy(bt.Strategy) # Get a pandas dataframe datapath = ('datos1.pand') # Simulate the header row isn't there if noheaders requested skiprows = 1 if args.noheaders else 0 header = None if args.noheaders else 0 dataframe = pandas.read_hdf(datapath,"table",openinterest=-1) dataframe["Timestamp"] = dataframe["Timestamp"].apply(to_datetimeindex) dataframe["Timestamp"] = dataframe["Timestamp"].apply(to_Timestamp)
-
As far as I am aware, BT data feeds require all timestamps / dates to be datetime objects. So that is why you need to convert the TS before adding it.
In one of my scripts I do this by:
df.index = pd.to_datetime(df.index, format='%Y-%m-%dT%H:%M:%S.%fZ')
Some more notes here:
https://www.backtrader.com/docu/pandas-datafeed/pandas-datafeed.html?highlight=pandasAnother thing you could look at is the
datetime
keyword argument in the link above. It looks like your timestamp is not the index.datetime (default: None)
None : datetime is the “index” in the Pandas Dataframe
-1 : autodetect position or case-wise equal name
= 0 : numeric index to the colum in the pandas dataframe
string : column name (as index) in the pandas dataframeHaving said all that, you now have it working! So it does not matter too much :)
-
Humm... your one-line solution seems more elegant than mine. Thanks! :)
-
@thatblokedave thanks, that's work for me.