Help required with dataset to use for backtesting
Hi, I have a dataset of stocks (2007 - 2021) that I want to backtest my strategy on. This dataset I have bifurcated into parts:
- In sample (this is the past dataset that I will run the backtest on and make optimisations to the strategy basis the results from the backtest). This is approx from 2007-2015.
- Out of sample - the optimised strategy I will then run on this dataset to understand how it would perform between 2016-2021.
I have already run some backtests and through that, one thing that I have come to realise is that my strategy is momentum based and hence will work only on stocks that have momentum or have had momentum in the past.
Considering this, does it make sense to exclude those stocks which have not had any momentum in the past and run the strategy on only those set of stocks which have exhibited momentum in the past?
run-out last edited by
@vypy1 You can use momentum stocks in your backtest. The main thing you need to be careful of is to only select momentum stocks based on criteria available at the time the selection is made. In other words, if you were looking at year over year momentum, you would start trading in 2008 and use the 2007 data to filter your stocks. This is valid, since you can do this in real life.
Not that you suggested it, but to state the obvious, if you select out momentum stocks using the the entire dataset 2007-2015, then run your backtest on those stocks, you are going to have invalid results due to 'forward looking bias'.
@run-out Thanks for pointing out the forward looking bias. This is not something I took into account. So it wouldn't make sense removing those stocks that haven't exhibited momentum for the whole duration of the dataset.
I can keep all stocks and look for momentum as the backtest progresses over the dataset. Then momentum is something I need to define.