Skip to content
Overfit

Your Sharpe ratio is a random variable

5 min read

A backtested Sharpe is one draw from a distribution, not a fact. Computing its standard error, and why two years of data barely separates a Sharpe of 1 from zero.

A backtest hands you a single number for the Sharpe ratio and invites you to treat it as a fact. It is not a fact. It is one draw from a distribution, and if you reran the same strategy on a different sample of the same length you would get a different number, sometimes a very different one. The Sharpe you are quoting is a point estimate of a population quantity you will never observe directly, and like any estimate it carries a standard error that almost nobody bothers to compute.

I care about this because I have talked myself into strategies on the strength of a backtested Sharpe near 1.5, put them live, and watched them settle closer to 0.4. The gap was not always bad luck. Often the 1.5 never meant what I assumed it meant.

The estimator has error, and you can put a number on it

The Sharpe ratio is mean excess return divided by the standard deviation of returns. Both pieces are estimated from a finite sample, so the ratio inherits their sampling error. Andrew Lo derived the distribution in his 2002 Financial Analysts Journal paper, and the headline result is simple enough to keep in your head. Under returns that are independent, identically distributed and normal, the standard error of the estimated Sharpe ratio is approximately

SE(SR) = sqrt((1 + 0.5 * SR^2) / T)

where SR and T share the same periodicity. Feed it a monthly Sharpe estimated from T monthly returns and you get the standard error of the monthly Sharpe. There is a free full text hosted at Berkeley if you want to follow the derivation rather than take my word for it.

What that does to a real track record

Put numbers through it. Say you hold five years of monthly returns, so T is 60, and your estimated annualised Sharpe is a respectable 1.0. Working in monthly terms, the monthly Sharpe is 1.0 divided by the square root of 12, roughly 0.289. The monthly standard error lands near 0.132, and multiplying by the square root of 12 to annualise gives a standard error on the annual Sharpe of about 0.46.

A 95 percent confidence interval is then roughly 1.0 give or take 1.96 times 0.46, which runs from about 0.10 to about 1.90. Read that again. A perfectly ordinary Sharpe of 1.0, measured over five years, is statistically consistent with a true Sharpe anywhere from just above zero to nearly 2. You do not have a 1.0 strategy. You have a strategy whose Sharpe you can barely separate from nothing.

Shorten the window and it deteriorates quickly. Two years of monthly data, T of 24, the same estimated Sharpe of 1.0, and the annual standard error climbs to about 0.72. The confidence interval now stretches from roughly minus 0.42 to about 2.42. It straddles zero. A two year record with a Sharpe of 1 is not, by itself, evidence that the strategy makes money at all.

The two lines of code nobody runs

import numpy as np
def sharpe_se(sharpe_per_period, n_periods):
return np.sqrt((1 + 0.5 * sharpe_per_period**2) / n_periods)
# annualised Sharpe of 1.0 from 60 monthly observations
monthly_sr = 1.0 / np.sqrt(12)
se_annual = sharpe_se(monthly_sr, 60) * np.sqrt(12)
print(round(se_annual, 3)) # 0.456

That is the entire calculation. NumPy and one formula. There is no good reason to quote a Sharpe without also quoting how wide the error bars around it are, yet nearly every backtest writeup does precisely that.

It is worse than the clean case, in two directions

The formula assumes returns are independent and normal. Real returns are neither. Fat tails and negative skew, the sort you get from selling options or running carry, mean the clean standard error understates the true uncertainty. That direction is intuitive.

Serial correlation is the subtler problem. When returns are positively autocorrelated, which happens whenever you hold illiquid instruments or mark to stale prices, the naive Sharpe is inflated and you cannot annualise a monthly figure by multiplying by the square root of 12. Lo shows that ignoring this can overstate an annual Sharpe by more than 65 percent for some hedge fund return streams. The remedy is a serial-correlation-robust standard error of the Newey-West type, which is what the paper recommends once you step outside the independent case. Rob Carver has made the same argument for systematic futures traders more than once: the number is noisier than it looks, and smoothing makes it lie.

Selection bias is the real killer

Everything above assumes you tested one strategy. You did not. You tried a dozen lookback windows, a couple of position sizing rules, three different universes, and you kept whatever combination backtested best. The Sharpe you are admiring is the maximum over many trials, and the maximum of a set of noisy estimates is biased upward by construction. This is the mechanism behind most strategies that shine on paper and expire in production.

The tool for it is the Deflated Sharpe Ratio from David Bailey and Marcos López de Prado, which adjusts the significance threshold for the number of trials you ran and for the non-normality of the returns. The Wikipedia entry is a reasonable place to start if the paper is heavy going. Selection bias gets its own post here, because it is the single most common reason a backtest flatters you and it deserves the space.

What I actually do about it

None of this makes the Sharpe ratio useless. It makes it a measurement with error bars, which is all any statistic has ever been. In practice I do a few dull things. I compute the standard error every time, and I refuse to get excited about a high Sharpe on less than three or four years of data. I compare confidence intervals rather than point estimates, because two Sharpes of 1.2 and 0.9 are usually the same number wearing different clothes. And I discount hard for the count of variations I ran before settling on the one I liked.

If you keep one habit from this, make it asking how long the track record is before asking what the Sharpe is. A Sharpe of 2 over eight months tells you close to nothing. A Sharpe of 0.7 over fifteen years tells you a lot.

Nothing here is advice to trade any particular strategy. It is a warning about a number. The point is to leave you more sceptical of your own backtests, mine included.