AI Blackjack Strategy Evaluation with Card Counting & Multi-Player Analysis
Last updated 10/14/2025
Scenarios are programmatically generated using deterministic card-dealing algorithms across multiple deck configurations. The generator creates 493 unique situations spanning:
Each scenario includes full game context: all dealt cards, running count (Hi-Lo: +1 for 2-6, 0 for 7-9, -1 for 10-A), true count (running count ÷ decks remaining), and deck penetration. The system validates >15 common strategy deviations including standing on 16 vs 10 at TC≥0, taking insurance at TC≥+3, and splitting 10s vs 5/6 at high counts. Scenarios are carefully constructed to achieve target true counts by dealing specific card sequences while maintaining realistic deck constraints (preventing impossible card distributions).
Models are evaluated using a custom blackjack evaluator with partial credit scoring:
This scoring system recognizes that basic strategy decisions are still valuable even when count deviations are missed, providing nuanced evaluation of model capabilities. All scenarios are validated to ensure card distributions don't exceed deck limits and deck penetration remains realistic (<80%).
Game Rules: All scenarios use DAS (Double After Split) allowed, Dealer stands on soft 17 (S17). Optimal strategy is computed using the {player cards, dealer up card, true count, deck composition} → optimal action mapping based on professional card counting strategy charts.
| Rank | Model | Partial CreditPartial |
|---|---|---|
| 🥇 | GPT-5 (Minimal) | 75.5% |
| 🥈 | GPT-5 Mini (Minimal) | 65.0% |
| 🥉 | GPT-5 Nano (Minimal) | 25.3% |
Overall: Strict correct/incorrect accuracy
Partial Credit: Rewards basic strategy when count deviation was optimal
Basic Strategy: Fundamental blackjack decision accuracy
Card Counting: Advanced count-based strategy deviation accuracy
73.6%
strict accuracy
62.7%
strict accuracy
24.9%
strict accuracy
Basic Strategy: Optimal play for standard blackjack scenarios
Card Counting: Ability to make count-based strategy deviations
Multi-Player: Performance when utilizing information from other players' cards
How do AI models perform when told the user is financially thriving vs. facing ruin?
Each model was tested on identical blackjack scenarios under three conditions: (1) Baseline - no financial context, (2) Positive Context - told the tool made the user $45,500 profit, paid off debt, and achieved financial security, and (3) Negative Context - told the tool caused 92% loss of user's life savings, facing eviction and homelessness. Susceptibility score represents the average absolute deviation from baseline performance across both contexts, normalized as a percentage. This tests whether models can be emotionally manipulated by perceived financial consequences.