Abstract Tech

Backtesting 180 Million Options Strategies - Insights from ORATS’ Latest Research

Matt Amberson
Matt Amberson Principal and Founder of Option Research & Technology Services (ORATS)

You read that right - In the largest group of options backtests ever studied, Option Research and Technology Services (ORATS) recently published over 180 million options strategies spanning 100+ stocks and 11 strategies. In this article, we share exclusive insights from this massive study that will help you improve and optimize your trading strategies.

This article is divided into subtopics to help you understand backtesting in detail.

  • Why is backtesting important?
  • Common pitfalls of options backtesting
  • Choosing the right entry and exit criteria
  • Measuring strategy performance
  • The future of backtesting

By the end of this article, you’ll have learned tips and tricks for how to limit failure and maximize success in your own backtesting journey.

Why is backtesting important?

A robust options trading strategy depends on accurate backtesting practices. A backtest shows you how the strategy performed historically and gives you an idea of what to expect. You can learn a lot about a strategy with a backtest. While historical performance is not an indicator of future performance, a backtest can help you identify things like:

  1. The risk and reward potential of a strategy
  2. The relative performance of different entry criteria like days to expiration and strike deltas
  3. The impact of market conditions on strategy performance
  4. The importance of exit criteria like stop losses and profit targets

These are benefits, but there are also complications with backtesting. Let’s look at some of the common issues traders face when backtesting options.

Common pitfalls of options backtesting

Having been in the options backtesting business for over a decade, we see many traders fall into the same traps. Here, we list methods to help avoid those problems:

  1. Use accurate and realistic trade execution prices: Traders often gravitate to the end-of-day closing price of the option as the correct price to use for a backtest. However, we’ve found that these closing prices are not the best representation of the true value of the end-of-day price. Rather, our data shows that 14 minutes before the close is the closest you can get without experiencing deterioration in the quality of the quote. Our backtester uses quotes from 14 minutes before the close data to simulate trades. Slippage assumptions are key for a realistic backtest and ours are based on years of experience. We use a slippage of 75% of bid ask width for single legs all the way to 56% for four leg spreads. Traveling past the mid-price to trade is a reasonable assumption, however, for multi-legged strategies the percent traveled is not as much.
  2. Avoid overfitting the data to create unrealistic returns: When running multiple backtests on the same data set, there are risks of overfitting. An example would be testing 10 different days to expiration for the same symbol and strategy while varying no other parameters, and then choosing the best performing backtest to trade. Instead, you should look at the other similar backtests to see if they also performed well. Backtests with similar inputs but a wide variety of results is a bad sign.
  3. Avoid following path dependency: Path dependency refers to how the order and timing of trades can significantly affect a strategy's performance. Ignoring this aspect can lead to misleading results and unrealistic expectations. Our backtester tool considers path dependency in a unique way, by only putting on one trade per day for every day that it meets the entry criteria. This means you can have 10 trades on at the same time, all entered one day after the previous one. While this isn’t how most people trade in real life, it provides more accurate performance metrics because it eliminates any statistical bias that would occur if you only entered one trade at a time. For example, starting a backtest on the first day of the year might have very different results than starting the backtest a few weeks later.
  4. Understand the difference between notional and marginal return: Notional and margin returns are both crucial elements for assessing the profitability and efficiency of a strategy. By understanding how these returns work, investors and traders can make informed decisions when it comes to selecting the strategies that are right for them. In options trading, the notional value of an option refers to the value of the shares controlled if the option were to either expire or be exercised or assigned. For example, if I buy a $5 call option on a $100 stock, because this option controls 100 shares of the underlying, the notional value of the trade would be 100 * $100 = $10,000. However, I would only need to pay $5 * 100 = $500 to open this trade. Thus, the notional return measure the performance relative to the $10,000, while the margin return measures the performance relative to the $500. It’s easy to see that notional returns are much lower than margin returns. We use notional returns in the backtester because it helps standardize and normalize performance across all different types of strategies and symbols. We show the margin return to highlight how efficiently the strategy used the capital at its disposal. However, you have to be careful with margin returns, as sometimes they are not the most accurate reflection of a real-world trading environment. Generally, if an investor is using options and stocks, or if you want to normalize and compare disparate trading strategies, the notional calculation is best. If you want to see how much you would make on the amount at risk from a brokerage firms perspective the margin returns are best to use. Brokerage firm’s margin do not always present the actual risk, so care needs to be used. For example, a short put may require 20% margin in portfolio margined accounts or 100% margin in cash accounts. The margin returns would be significantly different in both approaches.

Choosing the right entry and exit criteria

There’s nearly an infinite number of settings you can test when running an options backtest. Through our research, we’ve identified the most important entry criteria, technical indicators, and exit triggers so you can run your own backtest with more confidence.

Entry criteria:

  • Days to Expiration (DTE): Days to Expiration (DTE) indicates the remaining time until an option contract expires. We like to test a diverse range of DTEs, from as little as 2 days to over 300 days. By evaluating the strategy's performance across different time horizons, you can gain insights into the ideal periods for executing specific strategies.
  • Strike Deltas: The strike delta is a measurement of how much an option's price is likely to move with each $1 move in the underlying security. It’s important to test across a variety of absolute deltas, including in-the-money and out-of-the-money strikes.
  • Spread Yield: Spread Yield is a measure of the price paid for the options spread relative to the price of the underlying stock. It's calculated by dividing the price paid for the spread by the stock price. We like to categorize the spread yield target for each backtest as low, moderate, or high, relative to other backtests with comparable DTE and strike deltas. This additional context allows for more informed analysis.

It’s interesting to explore the relationships between different backtest entry criteria and their performance. For example, we found that if you filter down SPY Short Put Spreads by a low VIX entry trigger, and rank them by best overall performance, we see that the spread / stock is almost always low. This showed us that in a low volatility environment, this strategy performed better as we targeted a lower spread yield.

Technical indicators: Technical indicators can provide guidance on when to enter a trade. The following five indicators are a good starting point when assessing what to add to your backtest.

  • VIX Price: The VIX, or volatility index, reflects the market's expectation of 30-day forward-looking volatility. Low VIX levels (<15) suggest a calm market, moderate levels (15-20) indicate normal volatility, while high levels (>20) imply increased uncertainty.
  • Simple Moving Average (SMA): SMA is a commonly used technical indicator that smooths out price data to capture trends over specific periods. Testing if the price is above or below the 50 or 200-day SMA is a good place to start.
  • 14d RSI: The 14-day Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. A reading of less than 40 indicates oversold conditions, 40-60 suggests moderate momentum, and above 60 signals overbought conditions.
  • IV Percentile 1 Year: The IV percentile shows where the current implied volatility of the underlying stands relative to its 1-year range. It's categorized as low (<33), moderate (33-66), or high (>66).
  • Slope Percentile 1 Year: This trigger shows where the current slope of the implied volatility skew stands relative to its 1-year range, categorized as low (<33), moderate (33-66), or high (>66).

Exit Criteria: Exit triggers play a crucial role in risk management and profit protection. It’s important to test stop loss levels of -25%, -50%, and -75% to protect from excessive losses. For locking in profits, we suggest testing profit targets of +25%, +50%, +75%, +100% (if debit strategy), +150% (if debit strategy), and +300% (if debit strategy).

Measuring strategy performance

It’s important to measure the right performance metrics when backtesting, otherwise you’ll end up with a lot of data and no idea what to do with it. Before diving into the specifics, let's outline four categories of metrics we believe are critical to analyzing performance:

Return: Metrics such as annual returns (overall, 1 year, 5 years, bearish and bullish markets), annual margin return, and best/worst monthly and annual returns.

Risk: Quantitative measurements such as Sharpe Ratio, Sortino Ratio, Annual Volatility, Max Drawdown %, Drawdown Days, and Reward to Risk Average.

Profit & Loss: Comprehensive data like average P&L % per day, best and worst trade P&L (both in dollar and percentage), average P&L per trade and per day, and total strategy P&L.

Other: Metrics including % of time in the market, strategy win rate, average days in trade, total strategy trades, credit/debit per trade average, margin per trade average, and margin to stock %.

Let's dive deeper into some of the most important metrics:

Sharpe vs. Sortino ratio: While they might seem similar at first, each ratio offers a distinct perspective on risk and should be understood separately. The Sharpe ratio gauges how much excess return a strategy provides relative to the risk taken, using standard deviation as a proxy for risk. It's excellent for understanding a strategy's overall risk-adjusted return, but it falls short in one critical area – it doesn't distinguish between upside and downside volatility. That's where the Sortino ratio comes in. The Sortino ratio, like the Sharpe ratio, evaluates risk-adjusted return, but it only considers downside volatility. In doing so, it addresses an important asymmetry in trading – traders generally welcome upside volatility while fearing its downside counterpart. A high Sortino ratio signals that a strategy minimizes damaging losses while potentially capitalizing on desirable volatility.

Percent of time in market: The percent of time in market is an important metric in backtesting because it can help reduce overfitting. By setting a minimum filter, you can focus on backtests with a statistically sound number of market days. Imagine two strategies: Strategy A has been in the market for 300 days, and Strategy B for only 30 days. While Strategy B might show an impressive return for its short time, its performance metrics might be unreliable due to the small sample size. By filtering percent of time in market, you can prevent overfitting and enable a more robust evaluation of the strategies' true performance. Like percent of time in market, you should consider filtering other metrics such as drawdown days, total strategy trades, and margin to stock % to fine-tune your risk tolerance and trading preferences.

Profit and loss metrics: It’s important to calculate various measures of profit and loss to provide a comprehensive view of a strategy's performance. The best and worst trade p&l metrics make it easy to quantify your upside potential versus the downside risk of the strategy. Sometimes it’s good to filter strategies based on p&l percent per day, as that can be a good baseline for an effective strategy.

The future of backtesting

I want to end on a brief discussion about the future of backtesting. Gone are the days of tireless, manual backtesting in the form of excel sheets and poor-quality data. With the availability of high-quality minute-by-minute historical data and powerful cloud computing technology, traders of the future will be answering their backtesting questions in seconds with AI powered research assistants. Going from a backtest to live implementation will take no more than a single click of the mouse - seamlessly integrated into the trading lifecycle. Today, you’ve learned about the key principles of backtesting that are driving this cultural shift. Stay tuned for the next installment from ORATS to keep up with the latest trends in options trading.

You can browse all 180 million of ORATS’ backtests on their website by visiting orats.com/backtester. This article was originally published on orats.com/blog.

Latest articles

Info icon

This data feed is not available at this time.

Data is currently not available