CSV with 1000 tickers and only monthly revenue and close price data
-
Hi, I'm fairly new to backtrader and python in general; and as I'm reading the documentation and some examples, most of them only acknowledge one ticker and they assume you are only going to work with complete daily OHLC data.
I have read how to extend the data feed in order to add other criteria.
-https://community.backtrader.com/topic/2971/multiple-assets-with-custom-pandas-dataframe
-https://community.backtrader.com/topic/158/how-to-feed-backtrader-alternative-data/5So far, I have an excel file with different tickers with only monthly close price and revenue figures that I will convert to a csv.
So my questions are...
a) Wich should be the best (easiest) formatting for the csv file in order to use it in backtrader ? Do I need one file per ticker with a date, price, revenue column resulting in +1000 files? Or maybe 2 files with only date or ticker header and revenue data in each row? Is it possible to use the data in just one file (two columns with a common heading)?
b) As I don't have more data, how can I tell backtrader to ignore all the missing data (O H L) or just to use the data that I have (close + rev)?
c) does it matter that the data is not daily? Say I cant test with monthly, yearly figures too?
d) I'll be working with only one strategy at a time, If i need to use more than one "data file" how can apply the criteria so the buy/sell signals check the full sample and give just one final result/graph contemplating all my "universe" of stocks.
Thank you in advance for your help.
-
@atmps While I'm new to BT, here are my thoughts on your questions:
a/b) I would just load the whole file in a Pandas DataFrame and then add each subgroup. Say your file has
date, ticker,close,rev
fields, the code might look like this (there might be mistakes in the example!):df = pd.read_csv(filename, parse_dates=['date']) for tkr, grp in df.groupby('ticker'): data = bt.feeds.PandasData( dataname=grp, datetime='date', close='close', open=None, high=None, low=None, volume=None, openinterest=None, plot=False, ) cerebro.adddata(data, name=tkr)
c) no. BT handles data of different time resolutions fine.
d) I think this is covered by loading all the tickers as in (a) if I understand the question
As for handling an additional custom field,
rev
, there is a good example in the docs where you extend GenericCSVData to add ape
line: https://www.backtrader.com/docu/extending-a-datafeed/
In my example above, I would just extend PandasData rather than GenericCSVData to add therev
line and then add it asrev='rev'
when creating it. -
@davidavr Thank you for your insight.
As I understand your response:
In excel i have
A B C D 1 ticker1 ticker 2 2 close rev close rev 3 date1 4 date2
and my csv should be
DATE, TICKER, CLOSE, REV date1, ticker1, close, rev date2, ticker1, close, rev . . . date1, ticker2, close, rev date2, ticker2, close, rev
then in the example code I should add just the rev like this?
df = pd.read_csv(filename, parse_dates=['date']) for tkr, grp in df.groupby('ticker'): data = bt.feeds.PandasData( dataname=grp, datetime='date', close='close', rev='rev', open=None, high=None, low=None, volume=None, openinterest=None, plot=False, ) cerebro.adddata(data, name=tkr)
and the strategy can be as simple as this???
def next (self): if not self.position: if self.rev <30: self.buy(size=1) else: if self.rev >50: self.sell(size=1)
Sry if I did not get that right.
-
@atmps Sorry - what I meant in the last part is that you'll need to extend PandasData like so:
class PandasDataRev(bt.feeds.PandasData): lines=('rev',) params=( ('rev', None), )
Then you can use it like this:
for tkr, grp in df.groupby('ticker'): data = PandasDataRev( dataname=grp, datetime='date', close='close', rev='rev', open=None, high=None, low=None, volume=None, openinterest=None, ) cerebro.adddata(data, name=tkr)
You will need to figure out how to reshape your data from how you have it in Excel to a form where it's easy to iterate through it ticker by ticker. There are many ways you can do that. You could load it in Pandas and "unpivot" the data or just use it in the form you have but the indexing will be different.
And to your last question, you would access each dataset as
self.datas[n].close[0]
orself.datas[n].rev[0]
wheren
is an integer for the n-th ticker. (You are loading one data feed for each ticker so you have n feeds now).