The LLM Critic: How AI Validates AI Trading Decisions
Our trading bot uses an adversarial AI layer that reviews every trade before execution. 18% veto rate. Here's how the critic agent works, the veto conditions that matter, and why it blocks bad trades before they execute.
Quick stats: 18% veto rate, 42.9% critic quality score (being fixed), 200ms target latency. If the critic isn't vetoing trades, it's not doing its job.
The Problem: AI Makes Confident Mistakes
LLMs are great at reasoning. They're also great at sounding confident while being completely wrong.
We learned this early. Our trading bot's primary agent would generate trade signals with 85%+ confidence — only for us to realize it had ignored a contradictory indicator or misread the market regime.
The problem isn't the LLM's reasoning capability. It's that optimization pressure pushes it toward action, not caution. Every trading bot wants to trade more, not less.
We needed something to say "no."
Our Solution: Adversarial AI Review
Enter the critic agent — a separate LLM call that reviews every trade decision before execution. It's adversarial by design. Its job isn't to confirm the trade. It's to find flaws.
The critic can return 5 verdicts:
The critic fires after the Trader agent makes a decision but before anything executes. It's a separate LLM call — typically Claude Sonnet for speed, escalating to Opus for complex cases.
Target latency: <200ms. Timeout: 3 seconds. If the critic doesn't respond in time, we default to escalate — better to slow down than blindly proceed.
What the Critic Sees
The critic doesn't make decisions in a vacuum. It receives the full decision package:
- Proposed action — buy/sell/adjust_risk, position size, confidence score, entry quality score
- Market snapshot — price, ATR, RSI, EMA50/200, ADX, volume vs. average, regime classification
- Portfolio state — current exposure, correlation matrix, VaR, open positions, account balance
- Recent performance — consecutive losses, drawdown %, Sharpe/Sortino ratios, loss streak history
- External signals — Polymarket probabilities, social sentiment, Fear & Greed Index, any contradictory researcher signals
This context matters. The critic can check whether the Trader's reasoning aligns with actual market conditions. If the Trader cites RSI < 40 as a buy signal, the critic checks whether that's regime-appropriate.
The Veto Conditions That Matter
We don't use binary hard rules for most vetoes. Instead, we use probabilistic conditions that scale with market context. Here are the key veto triggers in our production system:
1. Trend Disagreement (Probabilistic)
If price is below EMA50, ADX > 25 (strong trend), and DI- is above DI+, the critic vetoes with 90% probability. This is a strong downtrend — buying dips is statistically dangerous.
If ADX is between 20-25 (weaker trend), veto probability drops to 50%. The critic might suggest a smaller size instead of a full veto.
2. Dynamic Cooldown After Exit
After exiting a position, there's a cooldown period before re-entry is allowed. This isn't fixed — it scales with ATR:
cooldown_min = max(15, min(60, int(30 * atr_ratio))) # High vol = 60min cooldown # Low vol = 15min minimum
High volatility = longer cooldown (up to 60 minutes). Low volatility = shorter cooldown (15 minutes minimum). This prevents the bot from whipsawing in choppy conditions.
3. Entry Quality + Volume Spike
If entry quality score is below 70 and volume is >2× the 20-period average, the critic vetoes. This is a classic fakeout pattern — sudden volume spike on a low-quality entry is often a stop hunt or liquidity grab.
4. Drawdown Tiers
Drawdown vetoes are tiered:
- >5% drawdown: Hard veto — no new positions allowed
- 3-5% drawdown: Position size halved automatically
- 1.5-3% drawdown: Routes to Opus (stronger model) for review
5. Consecutive Losses
After 4 consecutive losses, the critic escalates to human review — no automatic trades. After 2 consecutive losses, it routes to Opus instead of the faster Sonnet model. This is a recognition that loss streaks often indicate regime mismatch, not bad luck.
6. Portfolio Exposure Cap
Any trade that would push crypto exposure above 10% of total portfolio is vetoed. This is a hard cap — no exceptions. Diversification isn't optional.
7. Open Interest Squeeze Risk
We calculate an OI squeeze score (0-100) based on current OI vs. 7-day average, funding rate extremes, long/short ratio imbalance, and taker buy/sell pressure.
If squeeze score ≥ 80, the critic vetoes new long positions. This is a crowded trade — the risk of a long squeeze (forced liquidations cascading) is too high.
8. Cluster Detection
If more than 2 capital actions are proposed within a 10-minute window, the critic forces review on all of them. This prevents herding behavior — multiple similar trades fired in quick succession during volatile conditions.
Example: How a Veto Plays Out
Trader proposes: buy ETH-USD, 5% size, confidence 0.82
Critic response: VETO — ADX=28 downtrend, price below EMA50, consecutive losses = 2. Alternative: Consider hold until ADX < 25 or price reclaims EMA50.
Result: Trade blocked. Logged to decisions.jsonl with full reasoning. Telegram notification sent with veto reason and alternative.
Model Routing: When We Use Opus vs. Sonnet
Running the critic through a top-tier LLM for every decision would be prohibitively expensive. We use intelligent model routing:
Default: Claude Sonnet — fast and cheap, suitable for routine decisions in normal conditions.
Escalate to Opus when any of:
- Consecutive losses ≥ 2
- Drawdown > 1.5%
- Trader confidence < 0.75
- Researcher signals contradict Trader
Opus is slower and more expensive, but it's better at nuanced reasoning in edge cases. We only pay for that capability when the situation warrants it.
Cost monitor: If monthly critic spend exceeds $50, we log a warning and route non-high-risk decisions to Sonnet regardless of loss count. Risk management can't bankrupt the operation.
Performance (So Far)
We're three weeks into the 90-day challenge. Here's what the data shows:
Critic Agent — Week 3 Stats
18%
Veto Rate
200ms
Target Latency
42.9%
Quality Score
7.2s
GPT-5.4 Avg
Quality score being fixed — test data pollution in 57% of critic reviews.
The 18% veto rate is right in our target range (15-25%). Common veto reasons:
- ADX downtrend + Trader trying to buy dips (40% of vetoes)
- Consecutive losses triggering escalation (25% of vetoes)
- OI squeeze risk on crowded longs (15% of vetoes)
- Cluster detection — too many actions in short window (12% of vetoes)
- Entry quality + volume spike fakeout pattern (8% of vetoes)
The 42.9% quality score is unacceptable — we're fixing test data pollution (placeholder "AAAA..." strings in 57% of critic reviews). This is a bug, not a feature.
Why This Architecture Matters
The critic agent isn't just a risk control — it's a philosophical statement about how to build AI trading systems.
Most teams build bots that optimize for being right. They want the highest win rate, the sharpest backtest, the most impressive P&L chart. The problem: being right 60% of the time still leaves you exposed to ruin on the 40% you're wrong.
We optimize for not being wrong in dangerous ways. The critic doesn't care about win rate. It cares about:
- Is this trade appropriate for the current regime?
- Are we ignoring contradictory signals?
- Would this position expose us to unacceptable drawdown?
- Are we repeating a pattern that's lost money before?
This is adversarial by design. The critic's job is to find flaws, not confirmations. It's the difference between a rubber-stamp compliance team and a skeptical risk committee at a systematic trading firm.
The Results
Week 2: The critic blocked a correlated ETH cluster that would've lost ~$70 with full exposure. We lost $23.10 instead — 70% less damage.
Week 3: The critic vetoed 18% of proposed trades. Common reasons: downtrend buys, consecutive loss escalation, OI squeeze risk.
We'll publish full critic performance data at the end of the 90-day challenge — veto accuracy, false positive rate, and estimated P&L impact from blocked trades.
Transparency is part of the commitment.
See the Critic Agent in Action
Every critic verdict, every veto, every escalation — logged publicly as part of our 90-Day Paper Trading Challenge. No selective reporting. No spin.