The $ \sigma $-algebra

If you design trading strategies in a $ \sigma $-algebra space: $ \mathcal {N}(\mu , \sigma^2) $, meaning using averages and standard deviations for data analysis and trading decisions. Then, all you will see will be confined within that space. It implies that you will be dealing most often with a normal distribution of something of your own making. This allows setting up stochastic processes with defined properties. Things like mean-reversion and quasi-martingales. But it also reduces the value of outliers in the equation. They will have been smoothed out over the chosen look-back period. It will also ignore any pre-look-back period data for the simple reason that none of it will be taken into account.

There is a big problem with this point of view when applied to stock prices as these do not quite stay within those boundaries. The data itself is more complicated and quite often moves out of the confines of that self-made box. For example, a Paretian distribution (which would better represent stock prices) will have fat tails (outliers) which can more than distort this $ \sigma $-algebra.

Stock prices, just as any derivative information from them, are not that neat! So, why would we treat them as if they were? The probability density function of a normal distribution has been known for quite some time:

$\quad \quad \displaystyle f(x\mid \mu ,\sigma ^{2})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}}$

but does it really describe what we see in stock prices where $\mu$ is in itself a stochastic process, and $\sigma$ is another stochastic process scaling a Wiener process $\sigma dW$ of its own in an environment where skewness, kurtosis or fat tails are prevalent. It is like wanting to model, with everything else, some "black swans" which by their very nature are rare events, 10 to 20+ $\sigma$s away from their mean $\mu$. Consider something like the "Flash crash" of May 2010, for instance. There were price moves there that should not have happened in 200 million years, and yet, there they were. Those are things you do not see coming and for which you are not prepared in the sense that your program might not have been designed to handle such situations. Knowing some $\mu$ and $\sigma$ does not give predictability to tomorrow's price.

Some will simply remove the outliers from their datasets. Thereby building an unrealistic data environment where variance ($\sigma^2$) will be more subdued and produce smoother backtest equity curves until the real world "black swan" comes knocking, and it will. This will screw up all those aligned $\mu$s and $\sigma$s. On the other hand, leaving all the outliers in will result in averaging up all the non-outliers giving a false sense of their real values, and again give a distorted image of the very thing they are analyzing.

Randomness

Due to the quasi-randomness features of stock price movements, the sense of their forward probabilities will also be distorted due to the stochastic nature of those same $\mu$s and $\sigma$s. And since this stochastic process is still looking at a stochastically scaled Wiener process, you enter into the realm of uncertain predictability. It is like playing heads or tails with a randomly biased coin. You thought the probability was 0.50 and made your bets accordingly, but it was not, the probability was randomly distorted allowing longer winning and losing streaks of larger magnitude (fat tails).

You do some $\mu$s and $\sigma$s over a selected dataset, say some 200 stocks and get some numbers. But those numbers only applied to that particular dataset, and over that particular look-back period, not necessarily its future. Therefore, the numbers you obtained might not be that representative going forward. Yet, even if you know this, why do you still use those numbers to make predictions as to what is coming next and have any kind of confidence in those expected probabilities?

The structure of the data itself is made to help you win the game. And if you do not see the inner workings of these huge seemingly random-like data matrices, how will you be able to design trading systems on fat-tailed quasi-martingales or semi-martingales structures?

There is no single factor that will be universal for all stocks. Period. That you keep searching might be just a waste of time. If there ever was one, it has been arbitrage out a long time ago, even prior to the computer age. Why is it that studies show that results get worse when you go beyond linear regressions? Or that adding factors beyond 5 or 6 does not seem to improve future results that much? The real questions come in with: why keep on doing it if it does not work that good? Are we suppose to limit ourselves to the point of not even beating the expected returns of long-term index funds?

The Game

The game itself should help you beat the game. But you need to know how the data is organized and what you can do with it. To know that, you need to know what the data is, and there is a lot of it. It is not just the price matrix $P$ that you have to deal with, it is also all the information related to it. And that information matrix $\mathcal {I}$ is a much larger matrix. It includes all the information you can gather about all the stocks in your $P$ matrix. So, if you analyze some 30 factors in your score ranking composite, you get an information matrix $\mathcal {I}$ that is 30 times the size of $P$.

But, it is not all. Having the information is only part of the game. You now need to interpret all that data, make decisions on what is valuable and make projections on where that data is leading you, even within these randomly changing biases and expectations. All this makes even larger matrices to analyze.

You are also faced with the problem that all those data interpretations need to be quantifiable by conditionals and equations. Our programs do not have partial or scalable opinions, feelings or prejudices. They just execute the code they were given and oftentimes to the 12$^{th}$ decimal digit. This in itself should raise other problems. When you are at the 12$^{th}$ decimal digit to differentiate the z-scores, those last digits start to be close to random numbers. And the ranking you assign to those stocks also starts to exhibit rank positioning randomness which should affect their portfolio weights and reverberate in your overall results.

Another problem is the periodic rebalancing on those numbers. Say your portfolio has 200 stocks or more and when it rebalanced some 50 stocks were liquidated for whatever reason and replaced by 50 new ones. All the portfolio weights have changed. It is not only the new 50 stocks that have new weights, but it is also all the 200 stocks that see their weights moved up or down even if there might have been no need or reason to do so. The stock selection method itself is forcing the churning of the account. And there are monetary consequences to be paid on this.

If we ignore the randomness in price series, it does not make it go away! If we ignore outliers, they do not disappear, they will just eat your lunch, that you like it or not. It is why, I think, there is a need to strategize your trading strategy to make it do more even in the face of unreliable uncertainty.

If the inner working of your trading strategies do not address these issues, do you think they go away?