Skip to main content
TACAVAR
Trading Systems

The LLM Critic: How AI Validates AI Trading Decisions

Our trading bot uses an adversarial AI layer that reviews every trade before execution. 18% veto rate. Here's how the critic agent works, the veto conditions that matter, and why it blocks bad trades before they execute.

Quick stats: 18% veto rate, 42.9% critic quality score (being fixed), 200ms target latency. If the critic isn't vetoing trades, it's not doing its job.

The Problem: AI Makes Confident Mistakes

LLMs are great at reasoning. They're also great at sounding confident while being completely wrong.

We learned this early. Our trading bot's primary agent would generate trade signals with 85%+ confidence — only for us to realize it had ignored a contradictory indicator or misread the market regime.

The problem isn't the LLM's reasoning capability. It's that optimization pressure pushes it toward action, not caution. Every trading bot wants to trade more, not less.

We needed something to say "no."

Our Solution: Adversarial AI Review

Enter the critic agent — a separate LLM call that reviews every trade decision before execution. It's adversarial by design. Its job isn't to confirm the trade. It's to find flaws.

The critic can return 5 verdicts:

AGREETrade proceeds through normal routing
MONITORProceed but flag for post-trade watch (ADX shift, drawdown acceleration)
ADJUSTSuggests smaller size, tighter stop, or different entry — max 2 adjustment cycles
VETOBlocks trade entirely, logs reasoning, suggests alternative
ESCALATEUncertain — sends to human with 5-minute timeout, then auto-veto

The critic fires after the Trader agent makes a decision but before anything executes. It's a separate LLM call — typically Claude Sonnet for speed, escalating to Opus for complex cases.

Target latency: <200ms. Timeout: 3 seconds. If the critic doesn't respond in time, we default to escalate — better to slow down than blindly proceed.

What the Critic Sees

The critic doesn't make decisions in a vacuum. It receives the full decision package:

  • Proposed action — buy/sell/adjust_risk, position size, confidence score, entry quality score
  • Market snapshot — price, ATR, RSI, EMA50/200, ADX, volume vs. average, regime classification
  • Portfolio state — current exposure, correlation matrix, VaR, open positions, account balance
  • Recent performance — consecutive losses, drawdown %, Sharpe/Sortino ratios, loss streak history
  • External signals — Polymarket probabilities, social sentiment, Fear & Greed Index, any contradictory researcher signals

This context matters. The critic can check whether the Trader's reasoning aligns with actual market conditions. If the Trader cites RSI < 40 as a buy signal, the critic checks whether that's regime-appropriate.

The Veto Conditions That Matter

We don't use binary hard rules for most vetoes. Instead, we use probabilistic conditions that scale with market context. Here are the key veto triggers in our production system:

1. Trend Disagreement (Probabilistic)

If price is below EMA50, ADX > 25 (strong trend), and DI- is above DI+, the critic vetoes with 90% probability. This is a strong downtrend — buying dips is statistically dangerous.

If ADX is between 20-25 (weaker trend), veto probability drops to 50%. The critic might suggest a smaller size instead of a full veto.

2. Dynamic Cooldown After Exit

After exiting a position, there's a cooldown period before re-entry is allowed. This isn't fixed — it scales with ATR:

cooldown_min = max(15, min(60, int(30 * atr_ratio)))
# High vol = 60min cooldown
# Low vol = 15min minimum

High volatility = longer cooldown (up to 60 minutes). Low volatility = shorter cooldown (15 minutes minimum). This prevents the bot from whipsawing in choppy conditions.

3. Entry Quality + Volume Spike

If entry quality score is below 70 and volume is >2× the 20-period average, the critic vetoes. This is a classic fakeout pattern — sudden volume spike on a low-quality entry is often a stop hunt or liquidity grab.

4. Drawdown Tiers

Drawdown vetoes are tiered:

  • >5% drawdown: Hard veto — no new positions allowed
  • 3-5% drawdown: Position size halved automatically
  • 1.5-3% drawdown: Routes to Opus (stronger model) for review

5. Consecutive Losses

After 4 consecutive losses, the critic escalates to human review — no automatic trades. After 2 consecutive losses, it routes to Opus instead of the faster Sonnet model. This is a recognition that loss streaks often indicate regime mismatch, not bad luck.

6. Portfolio Exposure Cap

Any trade that would push crypto exposure above 10% of total portfolio is vetoed. This is a hard cap — no exceptions. Diversification isn't optional.

7. Open Interest Squeeze Risk

We calculate an OI squeeze score (0-100) based on current OI vs. 7-day average, funding rate extremes, long/short ratio imbalance, and taker buy/sell pressure.

If squeeze score ≥ 80, the critic vetoes new long positions. This is a crowded trade — the risk of a long squeeze (forced liquidations cascading) is too high.

8. Cluster Detection

If more than 2 capital actions are proposed within a 10-minute window, the critic forces review on all of them. This prevents herding behavior — multiple similar trades fired in quick succession during volatile conditions.

Example: How a Veto Plays Out

Trader proposes: buy ETH-USD, 5% size, confidence 0.82

Critic response: VETO — ADX=28 downtrend, price below EMA50, consecutive losses = 2. Alternative: Consider hold until ADX < 25 or price reclaims EMA50.

Result: Trade blocked. Logged to decisions.jsonl with full reasoning. Telegram notification sent with veto reason and alternative.

Model Routing: When We Use Opus vs. Sonnet

Running the critic through a top-tier LLM for every decision would be prohibitively expensive. We use intelligent model routing:

Default: Claude Sonnet — fast and cheap, suitable for routine decisions in normal conditions.

Escalate to Opus when any of:

  • Consecutive losses ≥ 2
  • Drawdown > 1.5%
  • Trader confidence < 0.75
  • Researcher signals contradict Trader

Opus is slower and more expensive, but it's better at nuanced reasoning in edge cases. We only pay for that capability when the situation warrants it.

Cost monitor: If monthly critic spend exceeds $50, we log a warning and route non-high-risk decisions to Sonnet regardless of loss count. Risk management can't bankrupt the operation.

Performance (So Far)

We're three weeks into the 90-day challenge. Here's what the data shows:

Critic Agent — Week 3 Stats

18%

Veto Rate

200ms

Target Latency

42.9%

Quality Score

7.2s

GPT-5.4 Avg

Quality score being fixed — test data pollution in 57% of critic reviews.

The 18% veto rate is right in our target range (15-25%). Common veto reasons:

  • ADX downtrend + Trader trying to buy dips (40% of vetoes)
  • Consecutive losses triggering escalation (25% of vetoes)
  • OI squeeze risk on crowded longs (15% of vetoes)
  • Cluster detection — too many actions in short window (12% of vetoes)
  • Entry quality + volume spike fakeout pattern (8% of vetoes)

The 42.9% quality score is unacceptable — we're fixing test data pollution (placeholder "AAAA..." strings in 57% of critic reviews). This is a bug, not a feature.

Why This Architecture Matters

The critic agent isn't just a risk control — it's a philosophical statement about how to build AI trading systems.

Most teams build bots that optimize for being right. They want the highest win rate, the sharpest backtest, the most impressive P&L chart. The problem: being right 60% of the time still leaves you exposed to ruin on the 40% you're wrong.

We optimize for not being wrong in dangerous ways. The critic doesn't care about win rate. It cares about:

  • Is this trade appropriate for the current regime?
  • Are we ignoring contradictory signals?
  • Would this position expose us to unacceptable drawdown?
  • Are we repeating a pattern that's lost money before?

This is adversarial by design. The critic's job is to find flaws, not confirmations. It's the difference between a rubber-stamp compliance team and a skeptical risk committee at a systematic trading firm.

The Results

Week 2: The critic blocked a correlated ETH cluster that would've lost ~$70 with full exposure. We lost $23.10 instead — 70% less damage.

Week 3: The critic vetoed 18% of proposed trades. Common reasons: downtrend buys, consecutive loss escalation, OI squeeze risk.

We'll publish full critic performance data at the end of the 90-day challenge — veto accuracy, false positive rate, and estimated P&L impact from blocked trades.

Transparency is part of the commitment.


See the Critic Agent in Action

Every critic verdict, every veto, every escalation — logged publicly as part of our 90-Day Paper Trading Challenge. No selective reporting. No spin.