I Generated 1,500 Trading Strategies. 114 "Passed" on Bitcoin. None Were Real.
A hands-on demonstration of why in-sample / out-of-sample / holdout testing isn't enough to catch overfitting — and the statistics that actually are.
Open any trading forum and you'll find tools — some free, some sold for real money — that promise to discover profitable strategies for you. Point them at a market, they try thousands of indicator combinations, and they hand you the ones that "work."
I wanted to know what these tools actually produce. So I built one.
The generator — and its "rigorous" filter
My generator does what the commercial ones do: it assembles random combinations of simple indicators (moving averages, RSI, momentum, breakouts) into trading rules, then runs each one through three sequential gates:
- In-sample — does it beat buy & hold on the first chunk of history?
- Out-of-sample — does it still beat it on a later, unseen chunk?
- Holdout — and again on a final, untouched period?
A strategy survives only if it beats the benchmark in all three. That sounds rigorous — it's exactly the "train / test / holdout" discipline you're told to use.
I ran it on daily Bitcoin. Out of 1,500 random strategies, 114 made it through all three gates — out-of-sample-confirmed, holdout-confirmed. On paper, a goldmine.
Then I checked them properly
Here's the problem nobody mentions: when you try 1,500 strategies and keep the ones that passed, you're not finding edges. You're finding the luckiest coin flips.
Think of a student who takes the same multiple-choice exam 50 times and shows you only his best score. You don't need to know how he studied to suspect "best of 50" is inflated — you just need to know he had 50 tries. The same logic destroys most backtest "winners": the more strategies you search, the better the best one looks by chance alone. Three gates don't fix this, because the luck rides straight through all three.
The statistics that do account for the search tell a brutal story for my 114 Bitcoin "winners":
- Deflated Sharpe Ratio (corrects the Sharpe for how many strategies you tried): 0.48. You want > 0.95. Not close.
- Probability of Backtest Overfitting: 0.87 — the in-sample winner is below median out-of-sample most of the time. Worse than a coin flip.
Strategies that survived a proper, multiple-testing-corrected test: 0 out of 114.
The proof that removes all doubt
You could argue: "Maybe Bitcoin just had no edge to find." Fair. So I ran the exact same generator on pure random-walk data — synthetic prices with no edge by construction. There is literally nothing to find.
It still "validated" 7 strategies. The best had an annualized Sharpe of +0.52 — a number most traders would happily deploy real money on. On data that is pure noise.
That's the whole point: in-sample / out-of-sample / holdout cannot tell luck from skill. It will hand you confident, "validated," tradeable-looking strategies built on nothing at all. (The proper tests flagged all 7 instantly — 0 of 7 survived.)
What this means for your backtest
If you've ever:
- optimized parameters and kept the best combination,
- tried several entry/exit variations and shipped the one that "worked," or
- bought a strategy that came with a beautiful out-of-sample equity curve,
…then your result is a best-of-N — and its real significance depends on N, the number of attempts behind it. A great-looking Sharpe with a t-stat above 2 can still be pure search luck once you account for the dozens or thousands of tries.
The fix isn't "test harder" with more splits. It's the right statistics — the Deflated Sharpe Ratio, PBO, and data-snooping tests like White's Reality Check that explicitly correct for the size of the search.
Check yours in two minutes — free
You don't need to code any of this. I turned these exact tests into a free tool: upload your backtest's returns and get a plain-language verdict — real edge, or best of a thousand coin flips?
👉 Try it free: quantcheck — Deflated Sharpe Ratio, PBO and a bootstrap on your own data, no signup for the free verdict.
If this show-me-the-math approach to systematic trading is your thing, subscribe to the QuantDojo newsletter — I publish experiments like this one, plus new tests as I build them.
Nothing here is financial advice. It's a lesson in statistics — which, for backtests, is the only thing standing between you and a very expensive lie.