Although asset allocation remains crucial in our investment process, some of the most active retail investors might want to trade with an edge. This is where signals become important: most of the data out there can be used as a signal, such as technical analysis, macroeconomic variables, and orderbook data.

But the main advantage to using trading signals is the discovery of alpha: the true added value that active strategies seek in their setup.

Alpha is given by a combination of factors, including trading signals and asset allocation. With a positive alpha, we do not have the guarantee of a good strategy though: first of all, we must consider its net value, which includes fees, spreads and slippage; secondly, a high alpha investment might still have a high beta, such that the risk might become unbearable, or even negative beta, which leads to underperformance during rallies in the benchmark.

Can we achieve outperformance when using trading signals? This initial question is often overlooked by many retail investors (and sometimes even institutional ones). But what we need to address first is: does our signal make sense?

How to obtain trading signals

A trading signal can be generated by any sort of data. If we consider an MACD indicator, for example, we might go long if one of the two moving averages moves above the other one, and short otherwise. Of course, rules can change, and for example this can be applied to long-only strategies by exiting the market instead of going short.

Many datasets, including the widely available open-high-low-close-volume data (OHLCV), allow us to calculate some indicator or signal to enter into a position. But with the data revolution happening in recent years, institutional investors are looking for much more sophisticated datasets, which can allow them to outperform peers by accessing unique information, such as insider transactions, earnings forecasts or announcements, web traffic, meteorological data, etc.

What truly matters is the way we process the data. Even if just using OHLCV datasets, there might be some information lying under the surface that we can access with statistical calculations and adjustments; a simple example is momentum trading, which just uses past returns to identify a universe of assets to invest in. We then need an asset-allocation stage that allows us to expose ourselves to these factors in the proper way.

But before trading or backtesting, it would be useful to assess the meaning of a signal, why it matters, why it should work, but most importantly: if it works.

How to test a signal before using it

It is always difficult to research strategies in finance: it is very easy to see what does not work, but anything that works seems fishy.

A backtest is not the right tool to check if a signal works. Unfortunately, many still think that running thousands of backtests and selecting the top-performing one is the process followed by most professionals and institutional investors.

Why is this wrong? Because a backtest will work, at some point. Coincidentally or not, but we will not be able to guess if it is due to the randomness of guessing or actual solidity of the strategy. And especially when using different combinations of parameters (with the MACD example above, it might be something like testing all lags from 1 to 10 for one of the two moving averages, and from 20 to 50 for the other one), selecting the optimal one according to the backtest is a recipe for disaster. In fact, there is no actual rationale behind the selection of those numbers, it is just what overfits past data, with no implication for the future. And, especially when testing for sensitive datasets (imagine a stock with specific business cycles, mergers or acquisitions, product launches etc, or even macro events such as the Covid-19 pandemic, inflationary regimes, etc.), this becomes very dangerous.

Even if we run a single backtest, we need to take into account the 'luck' factor: how many trades were successful, when were they successful (are they all close in time or consistent) and how much do we make with winning vs. losing trades?

In summary, even if the backtest gives a positive return and Sharpe Ratio, we need evidence that the strategy works.

Before backtesting, and only in sample data, we need to assess if the signal makes some sense. The problem is, if we split our data historically, the signal might work in the past, but it might not work in the future (Type I error, or false positive, in statistics) or the signal might work in the future but not in the past (Type II error, or false negative). While the previous leads to financial loss, the latter might lead to opportunity costs.

How can we address this issue and avoid both errors to the best of our capabilities? Most importantly, how can we ensure we identify consistently working signals?

There are two main paths:

1) Mathematical Optimization. Some problems, especially in strategies like time series modelling or statistical arbitrage, might have an analytical solution that can be found with a specific formula or optimization routine.

2) Synthetic Data. If we build large datasets of random data, similar to the one we are going to test, we might be able to avoid overfitting.

A possible third path, suggested by Graham Giller, is to select a different asset and test our alpha on it. The basic assumption is that if we select an asset with low correlations with the original one, our strategy might work for the actual target one. The low correlation implies that future performance might differ greatly from the past one, so the training set is somehow not predictive of future returns.

We will focus for the moment on the methodology to identify meaningful signals. Method 1) might not need a grid search algorithm, since it would have an analytical solution for the best parameters in our model, but it is often unfeasible or very complicated for retail investors, and it does not apply to all signals. Both methods 2) and 3), plus the (suboptimal) option of testing on past data, are suitable for signal testing on any setup. Additionally, we are not targeting the optimal selection of signal parameters, but rather the significance of any signal, with any set of parameters, found in any possible way, optimized or randomly selected. [Full article available on Quant Evolution]

