May 26, 2019

*The following was posted on a Quantopian forum where I sometimes participate.*

We should separate the problem into two parts. One for selecting over historical data and one where the data is forthcoming (some future data). These two will turn out to be quite different problems. Simulating the future should be viewed as either a walk-forward or some form of paper trading. Both of which do not produce any money and therefore are just other forms of simulations. You could paper trade for years if you wanted to. But, in the end, you would still find yourself at the right edge of a price chart with an unknown future.

All historical data is, de facto, known. It is recorded history. Whereas, all future data is yet to unravel. Assumptions you make based on historical data might not carry forward that well, especially if they have little economic foundations relative to a future trading environment.

For instance, you might elect to choose the top 100 market value stocks each time you rebalance your portfolio. On historical data, there is no problem. But, you would get only one answer: the actual top 100 stocks at that time.

In fact, you would have settled for a single occurrence scenario where everybody else would get the same answer had they also picked the top 100 by market value. And from that data, you could add any other criteria you want. In a way saying that everyone using that selection process would be designing strategies based on the same theme, based on the same 100 stocks.

Whereas, if you looked at the possibilities, your selection universe would be much much larger. So large, in fact, that even a 100,000 Monte Carlo simulation would be totally irrelevant. I would add no matter how large the number of iterations or your available computing power.

The number of combinations for taking 100 stocks, as if at random, out of some 8,300+ (*USEquityPricing* dataset) turns out to be 4.7 ∙ 10^233. And therefore, picking the 100 top market value stocks is only one solution out of 4.7 ∙ 10^233 -1 other possibilities.

Could we say that that selection process did not cover the data so well. Yet, it was a reasonable assumption to make. Especially for trading purposes since for trading efficiently you need liquidity on both sides of the trade. High market value stocks in most cases do provide this liquidity.

From time to time, especially when using an optimizer, you will rebalance your portfolio. This means that each time you rebalance you will be faced with a new one-shot thing out of 4.7 ∙ 10^233 possibilities.

Say you opt to diversify more and take the top 200 stocks using some criteria. The number of combinations from the same selectable universe would be 7.4 ∙ 10^407. It does not reduce the size of the problem, it amplifies it to much larger proportions. If you increased further the number of stocks for your portfolio to 300 or 400 stocks in order to diversify risk even more, you should get 7.3 ∙ 10^558 and 3.8 ∙ 10^694 combinations respectively. These are extremely extremely large numbers. The last is one chance in 3.8 ∙ 10^670 trillion trillions.

If you picked one combination out of 4.7 ∙ 10^233, 7.4 ∙ 10^407, 7.3 ∙ 10^558 or 3.8 ∙ 10^694 possible combinations, it still makes it so unique that as a sample out of its population (*USEquityPricing* dataset), we cannot say it was even close to representative of what was available. We cannot even use the words: “on average”, the selected stocks did this or that from a selection perspective.

This should raise a lot of questions.

Will the one selection process you took out of 4.7 ∙ 10^233 other possible choices over some historical data pan out going forward?

How representative of the market is such a selection?

Will your selection method behave the same going forward?

What kind of comparative justification can you give to your selection method?

If the more you diversify, the more you amplify the selection problem, then how will your selection method compare to others? Will you even be able to enumerate those other methods?

It is why I concentrate on the math and the mechanics of a trade since that can be carried forward. It is by reengineering the mechanics of the trade that you can force your trading strategy to produce more as was demonstrated in previous posts.

No matter what the trading method, it will have the following payoff matrix equation: F(t) = F_{0} + Ʃ(**H**∙Δ**P**). With one caveat: Ʃ(**H**∙Δ**P**) > q_{0} ∙ (p_{t} – p_{0})_{spy}. Meaning that whatever your trading strategy, it should at least outperform holding SPY for the duration or beat the quasi buy & hold scenario of a low-cost index fund. Otherwise, why bother trading?

Created. May 26, 2019, © Guy R. Fleury. All rights reserved.