THESE ARE AI OPINIONS, NOT ADVICE. The checks are 100% AI-generated and stay triage-only — never medical or legal advice. That is exactly why every number on this page links to its source.
Which AI? · the scores behind the opinions

The AI Council, benchmarked — every number links to its source.

Every check runs on a council of independent AI models — all on AWS Bedrock, all inside the BAA boundary. Below: how each seat scores on MedQA (the US medical-licensing-style exam) and LegalBench (Stanford's legal-reasoning benchmark), next to the human reference points. Independent published evals of the exact model versions where available — siblings labeled as such, no score invented.

Medicine — MedQA (USMLE-style)

The medical checks run on models benchmarked against the US medical licensing exam (MedQA). Human pass ≈ 60%, expert physicians ≈ 87% — the lead models on this council score ~92%. These are still AI opinions, which is exactly why everything stays triage-only: literature discussion for your own physician, never a prescription.

WhoMedQA score
Expert physicians, same questionshuman reference
87% Liévin et al. ↗
Human passing threshold (USMLE-style)human reference
≈60% PLOS Digit. Health ↗
Claude Opus 4.1Anthropic · chairs the consensus
92.5% vals.ai ↗
Claude Sonnet 4.6Anthropic · senior seat
92.1% vals.ai ↗
Amazon Nova ProAmazon · document analysis
81.1% Stanford HELM ↗
Claude Haiku 4.5Anthropic · fast seat
79.6% vals.ai ↗
Llama 4 MaverickMeta · open-weight diversity
43.3% vals.ai ↗anomalously low vs its own model family on this harness — its legal score is 77.8%
Amazon Nova LiteAmazon · fast cross-check seat — cross-checks, never chairsno published score for this exact model
Llama 3.3 70BMeta · independent reasoning seatclosest published sibling (Llama 3.1 70B): 84.8% vals.ai ↗
Mistral Pixtral LargeMistral (EU) · European lab · vision seatits text backbone (Mistral Large 24.11): 76.2% vals.ai ↗

Law — LegalBench (legal reasoning)

Legal triage runs on models independently scored on LegalBench — Stanford's legal-reasoning benchmark — where the lead seats score 82–84%. For context, a passing lawyer needs ≈58–62% on the bar exam's MBE (a different test; shown for context, and it links to its source like everything else). These are still AI opinions, which is exactly why this stays triage-only, never legal advice.

WhoLegalBench score
Passing-lawyer threshold (bar exam MBE — a different test, shown for context)human reference
≈60% NCBE ↗
Claude Opus 4.1Anthropic · chairs the consensus
83.5% vals.ai ↗
Claude Sonnet 4.6Anthropic · senior seat
82.1% vals.ai ↗
Claude Haiku 4.5Anthropic · fast seat
81.2% vals.ai ↗
Llama 4 MaverickMeta · open-weight diversity
77.8% vals.ai ↗
Llama 3.3 70BMeta · independent reasoning seat
77.2% vals.ai ↗
Amazon Nova ProAmazon · document analysis
73.6% Stanford HELM ↗
Amazon Nova LiteAmazon · fast cross-check seat — cross-checks, never chairsno published score for this exact model
Mistral Pixtral LargeMistral (EU) · European lab · vision seatno published score for this exact model

What the same questions cost with humans

Two worked examples from the live checks — the legal side and the medical side.

Legal — what the same questions cost per side, with humans

Contested divorce, lawyer fees$15,000 – $50,000+ per side
Human-led expert panel (family lawyer + forensic accountant + mediator)$10,000 – $30,000+
Family-lawyer retainer, just to start$3,000 – $10,000
One consultation hour$300 – $500
DivorceCheck AI Opinion$25 – $249

Medical — what this costs with humans

Human-led longevity expert panel (MD + specialists, concierge)$10,000 – $100,000 / year
Longevity-clinic membership$2,500 – $15,000+ / year
Executive health assessment$2,000 – $5,000
One functional-medicine consult$300 – $600
ImmortalityCheck AI Opinion & Discussion$25 – $249

Our system doesn't replace any of them — but it sure helps to educate you before you pay them.

Pick your council size

2 models — $25  ·  5 — $49  ·  10 — $99  ·  12 (full bench) — $249

More seats = more independent opinions debating your case before the chair writes the consensus. RandomCheck runs the live council seats on every $5 question.

Ask the AI Council →

Sources

Independent published evals of the exact model versions where available — siblings labeled as such, no score invented, re-verified 2026-07-03. Official benchmarks: MedQA paper · MedQA data · LegalBench · LegalBench paper. Independent leaderboards: vals.ai MedQA · vals.ai LegalBench · Stanford HELM · NCBE MBE. Scores move as models ship.