Every check runs on a council of independent AI models — all on AWS Bedrock, all inside the BAA boundary. Below: how each seat scores on MedQA (the US medical-licensing-style exam) and LegalBench (Stanford's legal-reasoning benchmark), next to the human reference points. Independent published evals of the exact model versions where available — siblings labeled as such, no score invented.
The medical checks run on models benchmarked against the US medical licensing exam (MedQA). Human pass ≈ 60%, expert physicians ≈ 87% — the lead models on this council score ~92%. These are still AI opinions, which is exactly why everything stays triage-only: literature discussion for your own physician, never a prescription.
| Who | MedQA score |
|---|---|
| Expert physicians, same questionshuman reference | 87% Liévin et al. ↗ |
| Human passing threshold (USMLE-style)human reference | ≈60% PLOS Digit. Health ↗ |
| Claude Opus 4.1Anthropic · chairs the consensus | 92.5% vals.ai ↗ |
| Claude Sonnet 4.6Anthropic · senior seat | 92.1% vals.ai ↗ |
| Amazon Nova ProAmazon · document analysis | 81.1% Stanford HELM ↗ |
| Claude Haiku 4.5Anthropic · fast seat | 79.6% vals.ai ↗ |
| Llama 4 MaverickMeta · open-weight diversity | 43.3% vals.ai ↗anomalously low vs its own model family on this harness — its legal score is 77.8% |
| Amazon Nova LiteAmazon · fast cross-check seat — cross-checks, never chairs | —no published score for this exact model |
| Llama 3.3 70BMeta · independent reasoning seat | —closest published sibling (Llama 3.1 70B): 84.8% vals.ai ↗ |
| Mistral Pixtral LargeMistral (EU) · European lab · vision seat | —its text backbone (Mistral Large 24.11): 76.2% vals.ai ↗ |
Legal triage runs on models independently scored on LegalBench — Stanford's legal-reasoning benchmark — where the lead seats score 82–84%. For context, a passing lawyer needs ≈58–62% on the bar exam's MBE (a different test; shown for context, and it links to its source like everything else). These are still AI opinions, which is exactly why this stays triage-only, never legal advice.
| Who | LegalBench score |
|---|---|
| Passing-lawyer threshold (bar exam MBE — a different test, shown for context)human reference | ≈60% NCBE ↗ |
| Claude Opus 4.1Anthropic · chairs the consensus | 83.5% vals.ai ↗ |
| Claude Sonnet 4.6Anthropic · senior seat | 82.1% vals.ai ↗ |
| Claude Haiku 4.5Anthropic · fast seat | 81.2% vals.ai ↗ |
| Llama 4 MaverickMeta · open-weight diversity | 77.8% vals.ai ↗ |
| Llama 3.3 70BMeta · independent reasoning seat | 77.2% vals.ai ↗ |
| Amazon Nova ProAmazon · document analysis | 73.6% Stanford HELM ↗ |
| Amazon Nova LiteAmazon · fast cross-check seat — cross-checks, never chairs | —no published score for this exact model |
| Mistral Pixtral LargeMistral (EU) · European lab · vision seat | —no published score for this exact model |
Two worked examples from the live checks — the legal side and the medical side.
Legal — what the same questions cost per side, with humans
| Contested divorce, lawyer fees | $15,000 – $50,000+ per side |
| Human-led expert panel (family lawyer + forensic accountant + mediator) | $10,000 – $30,000+ |
| Family-lawyer retainer, just to start | $3,000 – $10,000 |
| One consultation hour | $300 – $500 |
| DivorceCheck AI Opinion | $25 – $249 |
Medical — what this costs with humans
| Human-led longevity expert panel (MD + specialists, concierge) | $10,000 – $100,000 / year |
| Longevity-clinic membership | $2,500 – $15,000+ / year |
| Executive health assessment | $2,000 – $5,000 |
| One functional-medicine consult | $300 – $600 |
| ImmortalityCheck AI Opinion & Discussion | $25 – $249 |
Our system doesn't replace any of them — but it sure helps to educate you before you pay them.
2 models — $25 · 5 — $49 · 10 — $99 · 12 (full bench) — $249
More seats = more independent opinions debating your case before the chair writes the consensus. RandomCheck runs the live council seats on every $5 question.
Ask the AI Council →Independent published evals of the exact model versions where available — siblings labeled as such, no score invented, re-verified 2026-07-03. Official benchmarks: MedQA paper · MedQA data · LegalBench · LegalBench paper. Independent leaderboards: vals.ai MedQA · vals.ai LegalBench · Stanford HELM · NCBE MBE. Scores move as models ship.