Sharing investing and trading ideas. Helping traders get started.

Saturday, June 13, 2009

What is good quality historical data?

When backtesting a strategy, you typically simulate trades based on historical data. The quality of your data matters. It gives you accurate simulated profits and saves you lots of time correcting data errors. Good quality data will usually fit the following critieria.

1) No gaps or missing data.

2) No invalid data.
(e.g. price data filled with symbols instead.)

2) Do not have same price or volume data pasted over several days or even months.

3) Are chronologically correct.
(i.e. the date and time stamps are correc.t)

4) Are consistent in terms of tick frequency.
(i.e. if it claims to be five-minute data, it remains so for the entire file.)

5) Are clear in explaining if the data has been padded over market holidays and weekends
(e.g. whether it uses every Friday's close as data for Saturday and Sunday.)

6) Are clear in explaining if the data has been adjusted for splits and bonus issues.

7) Are free of decimalization errors.
(i.e. the price should be quoted at up to 4 decimal places and the data shows 5 on some days.)

8) Are free of outliners that are clearly due to errors in recording.
(Certain outliners, huge jumps or falls in prices, are genuine and in certain cases due to errors
in recording.)

Typically when I download free data from the internet, I will do a quick check for these common problems first. The last thing you want is to rerun a backtest that just took two hours to complete. Don't be surprised that gaps can happen even for the some of the more reputable data sources. Data collection in the 1980s was patchy and not as professionally done as today.

Good quality data that dates far back in time are hard to find for free. If you are willing to pay for a professional data vendor such as eSignal, you will probably have a better time. Such vendors are usually data feed providers at the same time, giving you access to real-time data. However, a subscription with eSignal will set you back by US$100 a month at least for only stocks and futures data. Foreign exchange data will require an additional fee. A significant cost if you are starting out as a trader if little or not profits in the bank.

I also look out for the other good to have frills in even better quality data

1) Bid-ask quotes so that I can accurately estimate my transaction costs
2) Volume data if possible
3) A wide range granularity options so that I can choose from downloading tick-by-tick data or to just downloading the end of day data.

I wrote a post previously on some of the free data sources out there and I even attached an end of day historical FX rates file for those of you who not like to spend time surfing all those sites. As most free data sources discourage or ban bots, you will have to patiently and manually download the data. Not all of them are quality ones, but at least they are free.

[These are not all the issues that you need to check for. For the more technically inclined, Tick Data has several white papers on this subject of filtering and cleaning data]

Like what you have just read? Digg it or Tip'd it.
The objective of Finance4Traders is to help traders get started by bringing them unbiased research and ideas. Since late 2005, I have been developing trading strategies on a personal basis. Not all of these models are suitable for me, but other investors or traders might find them useful. After all, people have different investment/trading goals and habits. Thus, Finance4Traders becomes a convenient platform to disseminate my work...(Read more about Finance4Traders)


Post a Comment