“GPT-4 ranked higher than the majority of physicians in psychiatry… it performed similarly to the median physician in general surgery & internal medicine… GPT-4 performance was lower in pediatrics & OB/GYN but remained higher than a considerable fraction” of active doctors.
GPT-4 passes 4 out of 5 medical board exams. It matches doctors on exams for general surgery and internal medicine, and beats them in psychiatry. And although it does slightly worse overall in pediatrics and OB/GYN, it still does better than many. stevestewartwilliams.com/p/a-new-milest…pic.twitter.com/l13Zp1zYnw
AI is entering mental healthcare, and models are becoming more task-autonomous. Here, we propose the concept of “task-autonomous AI in mental healthcare” (TAIMH). We discuss a proposed structure, default behaviors, and failure modes. 🧵 (1/4)
AIs have a bad reputation for truth, so three important findings in this paper:
1) “LLM agents can achieve superhuman rating performance” on fact checking when given access to Google!
2) Bigger models are more factual
3) LLMs are 20x cheaper than humans arxiv.org/pdf/2403.18802…pic.twitter.com/oAWuaZFNPA