Formatting of output csv from bt.Writer
-
I am trying to collect some data points during backtest that I can process using Pandas. I am using bt.Writer to output my data feeds and indicators to a csv but the format is not very friendly to importing using pd.read_csv due to multiple duplicated column headers and extra lines that are not part of the csv.
Is there a way to custom format the output of the csv from bt.Writer?
-
@sfkiwi said in Formatting of output csv from bt.Writer:
and indicators to a csv but the format is not very friendly to importing using pd.read_csv
It was not meant to be pure csv output.
@sfkiwi said in Formatting of output csv from bt.Writer:
Is there a way to custom format the output of the csv from bt.Writer?
Yes, subclassing
Writer
and customizing the output. -
In order to read in as a dataframe what I really want is a row of headers that contain the labels for each data point and then each subsequent row represents the value at a point in time.
I looked into subclassing bt.Writer but the Header and each line of values comes pre-populated from the Strategy and DataSeries classes and the way that it is formatted doesn't provide anyway of knowing where each indicator starts and ends.
What I ended up doing instead was to override the getWriterHeaders and getWriterValues function in the Strategy. This allowed me to format the header and the values in the way I wanted.
This works for the indicators and any data coming from the Strategy, however it doesn't solve the problem for the data series. I can't see any easy way to override these functions for the data series.
-
I have the same need. someone told me to user analyzer to write out the CSV file. I think it is doable. but I cannot quite get the code working to do this. any suggestion where can I find the post about using analyzer write out the CSV.
-
@damag00 Could you let us know exaclty what info you are looking to capture? Might have examples for you.
-
@damag00 all the analyzers data are return to you as part of any cerebro run, then you can get this dict a put it into a df, csv, or any data structure.
Take a look of this:
https://www.backtrader.com/docu/analyzers/analyzers/#a-quick-exampleIn this case, thestrat variable collects all the analyzers info captured during the cerebro execution.
So you need to choose the right analyzer to collect the most valuables metrics and post processing for your own. -
@damag00 From SO:
You need to handle your analyzer a bit differently. You can literally grab data at every bar and then have it available to you at the end.Create a new analyzer, in my case I made:
class BarAnalysis(bt.analyzers.Analyzer):
In your analyzer in start you will create a new list.
def start(self): self.rets = list()
Then in next you will add a list of data for each bar. I use a try statement just in case there are any data problems, but it's probably not necessary.
Strategy
is included as as subclass and you can use its methods by callingself.strategy.getvalue()
as an example.def next(self): try: self.rets.append( [ self.datas[0].datetime.datetime(), self.datas[0].open[0], self.datas[0].high[0], self.datas[0].low[0], self.datas[0].close[0], self.datas[0].volume[0], self.strategy.getposition().size, self.strategy.broker.getvalue(), self.strategy.broker.getcash(), ] ) except: pass
Finally create a
get_analysis
method that you can use to get your results at the end.def get_analysis(self): return self.rets
Add your analyzer to before running cerebro. You can name it whatever you want, we'll need the name to call the results.
cerebro.addanalyzer(BarAnalysis, _name="bar_data")
Make sure you provide a variable for the results of the cerebro.run() method so you can collect the results of the backtest.
strat = cerebro.run()
Finally, get the data out of strat and do as you wish with it. In this case I'm creating a dataframe and printing.
bar_data_res = strat[0].analyzers.bar_data.get_analysis() df = pd.DataFrame(bar_data_res) print(df)
And the printout looks like:
/home/runout/projects/rb_master/venv/bin/python /home/runout/projects/scratch/20210424_analyzer.py 0 1 2 ... 6 7 8 0 2020-01-02 23:59:59.999989 212.70 213.36 ... 0 10000.00 10000.00 1 2020-01-03 23:59:59.999989 210.81 213.28 ... 0 10000.00 10000.00 2 2020-01-06 23:59:59.999989 210.18 213.59 ... 0 10000.00 10000.00 3 2020-01-07 23:59:59.999989 213.11 214.13 ... 0 10000.00 10000.00 4 2020-01-08 23:59:59.999989 212.43 216.47 ... 0 10000.00 10000.00 .. ... ... ... ... .. ... ... 247 2020-12-23 23:59:59.999989 268.38 269.31 ... 1 10015.38 9747.25 248 2020-12-24 23:59:59.999989 267.76 269.67 ... 1 10016.48 9747.25 249 2020-12-28 23:59:59.999989 270.48 270.55 ... 1 10014.82 9747.25 250 2020-12-29 23:59:59.999989 268.30 268.78 ... 1 10011.78 9747.25 251 2020-12-30 23:59:59.999989 264.45 265.64 ... 1 10010.86 9747.25 [252 rows x 9 columns] Process finished with exit code 0
The whole code looks like this:
import datetime import backtrader as bt import pandas as pd class BarAnalysis(bt.analyzers.Analyzer): def start(self): self.rets = list() def next(self): try: self.rets.append( [ self.datas[0].datetime.datetime(), self.datas[0].open[0], self.datas[0].high[0], self.datas[0].low[0], self.datas[0].close[0], self.datas[0].volume[0], self.strategy.getposition().size, self.strategy.broker.getvalue(), self.strategy.broker.getcash(), ] ) except: pass def get_analysis(self): return self.rets class Strategy(bt.Strategy): params = ( ("lowerband", 30), ("upperband", 70), ) def __init__(self): self.rsi = bt.ind.RSI(period=10) def next(self): if not self.position: if self.rsi <= self.p.lowerband: self.buy() elif self.rsi >= self.p.upperband: self.close() if __name__ == "__main__": cerebro = bt.Cerebro() ticker = "HD" data = bt.feeds.YahooFinanceData( dataname=ticker, timeframe=bt.TimeFrame.Days, fromdate=datetime.datetime(2020, 1, 1), todate=datetime.datetime(2020, 12, 31), reverse=False, ) cerebro.adddata(data, name=ticker) cerebro.addanalyzer(BarAnalysis, _name="bar_data") cerebro.addstrategy(Strategy) # Execute strat = cerebro.run() bar_data_res = strat[0].analyzers.bar_data.get_analysis() df = pd.DataFrame(bar_data_res) print(df)
-
If you want to go with writer and don't fear to post-process with pandas you can use this:
from backtrader import WriterBase import collections import io import itertools import sys from backtrader.utils.py3 import (map, with_metaclass, string_types, integer_types) import backtrader as bt class WriterPlain(WriterBase): '''The system wide writer class. It can be parametrized with: - ``out`` (default: ``sys.stdout``): output stream to write to If a string is passed a filename with the content of the parameter will be used. If you wish to run with ``sys.stdout`` while doing multiprocess optimization, leave it as ``None``, which will automatically initiate ``sys.stdout`` on the child processes. - ``close_out`` (default: ``False``) If ``out`` is a stream whether it has to be explicitly closed by the writer - ``csv`` (default: ``False``) If a csv stream of the data feeds, strategies, observers and indicators has to be written to the stream during execution Which objects actually go into the csv stream can be controlled with the ``csv`` attribute of each object (defaults to ``True`` for ``data feeds`` and ``observers`` / False for ``indicators``) - ``csv_filternan`` (default: ``True``) whether ``nan`` values have to be purged out of the csv stream (replaced by an empty field) - ``csv_counter`` (default: ``True``) if the writer shall keep and print out a counter of the lines actually output - ``indent`` (default: ``2``) indentation spaces for each level - ``separators`` (default: ``['=', '-', '+', '*', '.', '~', '"', '^', '#']``) Characters used for line separators across section/sub(sub)sections - ``seplen`` (default: ``79``) total length of a line separator including indentation - ``rounding`` (default: ``None``) Number of decimal places to round floats down to. With ``None`` no rounding is performed ''' params = ( ('out', None), ('close_out', False), ('csv', False), ('csvsep', ','), ('csv_filternan', True), ('csv_counter', True), ('indent', 2), ('separators', ['=', '-', '+', '*', '.', '~', '"', '^', '#']), ('seplen', 79), ('rounding', None), ) def __init__(self): self._len = itertools.count(1) self.headers = list() self.values = list() def _start_output(self): # open file if needed if not hasattr(self, 'out') or not self.out: if self.p.out is None: self.out = sys.stdout self.close_out = False elif isinstance(self.p.out, string_types): self.out = open(self.p.out, 'w') self.close_out = True else: self.out = self.p.out self.close_out = self.p.close_out def start(self): self._start_output() if self.p.csv: # self.writelineseparator() self.writeiterable(self.headers, counter='Id') def stop(self): if self.close_out: self.out.close() def next(self): if self.p.csv: self.writeiterable(self.values, func=str, counter=next(self._len)) self.values = list() def addheaders(self, headers): if self.p.csv: self.headers.extend(headers) pass def addvalues(self, values): print(values) if self.p.csv: if self.p.csv_filternan: values = map(lambda x: x if x == x else '', values) self.values.extend(values) def writeiterable(self, iterable, func=None, counter=''): if self.p.csv_counter: iterable = itertools.chain([counter], iterable) if func is not None: iterable = map(lambda x: func(x), iterable) line = self.p.csvsep.join(iterable) self.writeline(line) def writeline(self, line): self.out.write(line + '\n') def writelines(self, lines): for l in lines: self.out.write(l + '\n') def writelineseparator(self, level=0): sepnum = level % len(self.p.separators) separator = self.p.separators[sepnum] line = ' ' * (level * self.p.indent) line += separator * (self.p.seplen - (level * self.p.indent)) self.writeline(line) def writedict(self, dct, level=0, recurse=False): pass
It is the WriterFile, without seperator and with empty writedict.
-
@run-out thanks for your code.
is there a way to get rid of the no name column index 0,1,2,3...?bar_data_res = strat[0].analyzers.bar_data.get_analysis() df = pd.DataFrame(bar_data_res) df.columns = ['date', 'open', 'high', 'low', 'close', 'volume', 'position', 'value', 'cash'] df.set_index('date') print(df) df.to_csv("outpu.csv")
i tried to do
df.set_index('date')
didn't remove that number index. and also if I do
df = df.iloc[: , 1:]
to remove first column, it would skip the number column and remove the date column.
the output now looks like this:
date open high low close volume position value cash
0 7/19/2001 15.10000038 15.28999996 15 15.17000008 34994300 0 1000000 1000000
1 7/20/2001 15.05000019 15.05000019 14.80000019 15.01000023 9238500 0 1000000 1000000
2 7/23/2001 15 15.01000023 14.55000019 15 7501000 0 1000000 1000000
3 7/24/2001 14.94999981 14.97000027 14.69999981 14.85999966 3537300 0 1000000 1000000
4 7/25/2001 14.69999981 14.94999981 14.64999962 14.94999981 4208100 0 1000000 1000000
5 7/26/2001 14.94999981 14.98999977 14.5 14.5 6335300 0 1000000 1000000desired output look like this:
date open high low close volume position value cash
7/19/2001 15.10000038 15.28999996 15 15.17000008 34994300 0 1000000 1000000
7/20/2001 15.05000019 15.05000019 14.80000019 15.01000023 9238500 0 1000000 1000000
7/23/2001 15 15.01000023 14.55000019 15 7501000 0 1000000 1000000
7/24/2001 14.94999981 14.97000027 14.69999981 14.85999966 3537300 0 1000000 1000000
7/25/2001 14.69999981 14.94999981 14.64999962 14.94999981 4208100 0 1000000 1000000
7/26/2001 14.94999981 14.98999977 14.5 14.5 6335300 0 1000000 1000000 -
@damag00 said in Formatting of output csv from bt.Writer:
df.set_index('date')
df = df.set_index('date')