The principle of machine learning separates cleanly into three steps, namely calibration, validation, and verification.
Let us for the moment assume that the available data has been separated into two disjoint sets for training and validation.
consists of the following three steps.
I. Calibration (a.k.a. training) is the process of taking a supervised learning algorithm and finding a set of parameters which approximates a desired target to a satisfactory degree when measured on the training data set.
II.Validation is the process of checking the performance of the calibrated learner on the previously unused part of the data.
III. Verification is the process of choosing our desired learner amongst a variety of algorithms based on their calibration and validation performances.
Once we have calibrated, validated, and verified our learner, we’re ready to deploy it on previously unseen data, i.e. we can run it out-of-sample. For trading algorithms, out-of-sample normally refers to the real-time investment process (whereas all previous steps work with historical data),
So let’s look at what exactly artificial intelligence means in this context.
I. A learner A is more intelligent than a learner B if it can learn the solution to at least as many problems as B.
II. General artificial intelligence is achieved when a learner is more intelligent than all other available learners.
When general artificial intelligence is achieved, verification (the third step above) becomes obsolete and machine learning reduces to a two-step routine.
However, more intelligent is not necessarily better. In a data-rich environment, the more intelligent an algorithm, the better. But in a data-poor environment, algorithmic intelligence can lead to very poor results out-of-sample.
An example of this phenomenon can easily be constructed by using neural networks. When working with neural networks, if the data sets for training and validation are finite, for every calibrated and validated supervised learner that depends on a certain parameter set, another supervised learner can be found which
- has identical training and validation performance
- and which has a much greater parameter set.
The second learner can be constructed by simple appending a bit of neural network to the first learner and making sure that the parameters in the added bit are such that none of the neurons in it ever get triggered on the training and validation data.
Due to the bigger parameter set of the second learner, when the algorithms encounter out-of-sample data that is dissimilar from training and validation sets, potential differences between the two learners can arise.
By reverse logic, whenever we use a supervised learner with a parameter set that is not minimised for the task at hand, we are consciously accepting out-of-sample errors.
For any application relying on financial market data, data can generally be considered scarce. Hence, this last observation is of enormous importance in finance. Put differently: unless we are minimising the parameters in our supervised learner to the absolute minimum required to achieve the desired training and validation performance, we are creating out-of-sample errors.
For finite data sets, we therefore need to adapt our verification procedure to find a parameter-minimised algorithm which has our desired calibration and validation performances. Unfortunately, this can be a computationally very expensive task since a very large number of different algorithms needs to be tried. The benefit of course is an improvement in out-of-sample performance.
Let us look at a numerical example. We take data for five US stocks and create a target return by taking the daily average return plus a small white noise error term. Figure 1 shows the calibration results for the daily returns of 2017 and we see that both the shallow and the deep learner calibrate well.
In Figure 2 we see the out-of-sample performance on 2018 data. As would be expected, the tracking of the target data is not as good as in training, but, with the naked eye, we cannot yet see a noticeable difference between the shallow and the deep learner.
However, Figure 3 shows the out-of-sample performance on the daily returns for the period 2010 to 2016. Over this longer out-of-sample period, we see that, clearly, the error of the deep learner is much greater than that of the shallow learner. This is the type of out-of-sample error which a verification procedure which looks for a parameter-minimised learner would have avoided.
In Figure 4, we see the same 2010-2016 out-of-sample error results for a variety of learners which are ranked on the x-axis by the number of their parameters. (Deep1 has 10 free parameters, Deep2 has 12, Deep3 has 14, and so forth.) All of the plotted learners were calibrated to achieve identical results on the 2017 data. We see that, clearly, the out-of-sample performance decays substantially as the number of parameters increases.
We will briefly summarise our observations, which are twofold. First, general artificial intelligence, can never exist in an environment of data scarcity. Second, the importance of high-performance computing is inversely related to the size of the available calibration and validation data sets.
For the creation of algorithmic trading strategies, data can generally be considered scarce. Not because there isn’t a lot of financial data around (there is plenty), but because most of it is useless for prediction purposes due to structural changes in the markets (e.g., political or other). Consequently, the more computational power you have available for your verification procedure, the better your chance at making correct predictions.
Figure 1: We calibrate a shallow and a deep learner on the daily returns of 2017.
Figure 2: Out-of-sample performance on the daily returns of 2018.
Figure 3: Out-of-sample performance on the daily returns of 2010 to 2016.
Figure 4: Out-of-sample performance on the daily returns of 2010 to 2016 for a variety of learners ordered on the x-axis by increasing number of parameters. (Deep1 has 10 free parameters, Deep2 has 12, Deep3 has 14, and so forth.) We see that, clearly, the out-of-sample performance decays substantially as the number of parameters increases.
J. B. Heaton, N. G. Polson, J. H. Witte: Deep Learning for Finance: Deep Portfolios; Applied Stochastic Models in Business and Industry (ASMB), 33(1), pp. 1-12, 2017.
J. B. Heaton, N. G. Polson, J. H. Witte: Generating Synthetic Data to Test Financial Strategies and Investment Products for Regulatory Compliance, https://ssrn.com/abstract=3340018.