Backtesting Lies: What Our Data Showed
Our backtests showed 87% win rates. Live paper trading showed 60%. Here's why backtests lie, what they don't capture, and how to use them without getting fooled.
Uncomfortable truth: If your backtest looks too good to be true, it's lying to you. We learned this the hard way. Here's what the data actually showed.
The Backtest That Looked Perfect
Before launching our 90-day challenge, we backtested our mean reversion strategy on 12 months of historical data.
Backtest Results (12 months)
87%
Win Rate
2.4
Profit Factor
14%
Max Drawdown
156
Trades
This looked incredible. 87% win rate? We almost went live immediately.
Instead, we started paper trading. Three weeks in, here's what actually happened:
Live Paper Trading (21 days)
60%
Win Rate
1.6
Profit Factor
0.9%
Max Drawdown
8
Trades
87% → 60%. That's not a small gap. That's the difference between a profitable strategy and a mediocre one.
Here's what the backtest didn't tell us.
Lie #1: No Slippage Modeling
Our backtest assumed we get the exact close price. We don't.
Live trading reality:
- Market orders: Slippage of 0.1-0.5% on BTC/ETH
- Thin order books: 1-2% slippage on alts during volatility
- Polymarket: 3-5% slippage on markets with <$10K liquidity
Over 8 trades, slippage cost us ~$12 in paper trading. Extrapolated to 156 trades (the backtest sample), that's ~$230 in slippage the backtest didn't account for.
Lesson: If your backtest doesn't model slippage, add 0.3% per trade minimum. For thin markets, add 1-2%.
Lie #2: Perfect Execution
Backtest: Signal fires at 3:00 PM, executes at 3:00 PM close.
Reality: Signal fires at 3:00 PM, API rate limit hit, executes at 3:15 PM at a worse price.
We've hit rate limits 4 times in 21 days. Each time, execution was delayed by 15-30 minutes. In two cases, the entry price moved against us by 0.5%+.
Lesson: Backtests assume infinite liquidity and instant execution. Neither is true. Add latency modeling (15-30 minute delays) and rate limit simulation.
Lie #3: Regime Assumptions
Our backtest period (March 2025 - March 2026) was mostly ranging markets. Mean reversion thrives in ranging markets.
Week 3 of our challenge: the market flipped to trending for the first time. Our mean reversion strategy got suspended by the regime detector. Would've taken 3 losing trades if it hadn't.
The backtest didn't tell us the strategy is regime-dependent. It just showed "87% win rate" across the entire period.
Lesson: Segment backtests by regime. If a strategy only works in ranging markets, label it as such. Don't run it when ADX > 25.
Lie #4: No Psychological Pressure
This one's subtle but critical.
Backtest: After 3 consecutive losses, the strategy keeps trading normally.
Reality: After 3 consecutive losses, humans want to intervene. "Maybe the strategy is broken? Maybe we should pause?"
We coded circuit breakers because of this. After 3 consecutive losses, the bot pauses and requires manual reset. The backtest doesn't capture this — it assumes robotic discipline forever.
Lesson: If you're building automated systems, code in the psychological breaks you'll want. Then backtest with those breaks included.
Lie #5: Overfitting to History
We caught ourselves doing this. The backtest looked mediocre at first (72% win rate). We tweaked the RSI threshold from 30 to 28. Then we added a volume filter. Then we adjusted the Bollinger Band multiplier.
Final backtest: 87% win rate.
We'd optimized our way into overfitting. The parameters worked perfectly for the historical period — but had no guarantee of working going forward.
Lesson: If you're tweaking parameters to improve backtest results, you're overfitting. Use walk-forward analysis: optimize on period A, test on period B (unseen data).
What Backtests Are Actually Good For
We're not saying backtests are useless. They're just not predictive.
Here's what backtests are actually useful for:
1. Sanity Checking
If your backtest shows 95% win rates, your logic is wrong. If it shows 30% win rates with positive expectancy, that's plausible. Backtests catch obvious bugs.
2. Strategy Comparison
Mean reversion backtested at 2.4 profit factor. Momentum backtested at 1.8. That's useful information — even if both numbers are inflated, the relative difference is meaningful.
3. Edge Identification
Does your strategy have positive expectancy in the backtest? If yes, there might be a real edge. If no, there's definitely no edge. Backtests are better at ruling out bad ideas than confirming good ones.
4. Parameter Sensitivity
Test RSI 25, 28, 30, 35. If performance varies wildly, your strategy is fragile. If it's stable across a range, that's a good sign. Robustness matters more than peak performance.
Our New Validation Framework
After getting burned by backtests, here's what we do now:
Phase 1: Backtest (1-2 days)
- • Model slippage (0.3% minimum)
- • Add 15-minute execution delay
- • Segment by regime (ranging vs. trending)
- • Walk-forward analysis (optimize on A, test on B)
- Gate: Profit factor > 1.3 → proceed
Phase 2: Paper Trading (30 days minimum)
- • Real-time data, simulated execution
- • Full risk management stack active
- • LLM decision layer in the loop
- • All bugs and edge cases surface here
- Gate: 50+ trades, positive Sharpe → proceed
Phase 3: Live Trading (90 days)
- • Small position sizes (25% of target)
- • Real money, real psychological pressure
- • Monitor for deviation from paper results
- Gate: Performance within 20% of paper → scale up
Backtests are the starting point. Paper trading is the truth. Live trading is the final exam.
The Honest Answer
Someone's going to ask: "So what's your actual expected win rate?"
Honest answer: we don't know yet.
21 days, 8 trades, 60% win rate. That's not statistically significant. We need 50+ trades minimum. That's why we're doing 90 days of paper trading before going live.
The backtest said 87%. Paper trading says 60%. The truth is probably somewhere in between — but closer to 60%.
We'll publish the full 90-day results. Every trade. Every veto. Every bug. No selective reporting.
That's the only way to know if this actually works.
Follow the 90-Day Challenge
We're 21 days in. 50+ trades to go. Every trade logged publicly — wins, losses, and everything in between. No backtest lies. Just real paper trading data.