GPT-4's Bar Exam Performance: Is AI Really Good at Lawyering?

CLE Programs
Program Catalog
GPT-4’s Bar Exam Performance: Is AI Really Good at Lawyering?

GPT-4’s Bar Exam Performance: Is AI Really Good at Lawyering?

In the context of legal profession, there are various reasons to doubt the usefulness of UBE percentile as a proxy for lawyerly competence (both for humans and AI systems), given that, for example: (a) the content on the UBE is very general and does not pertain to the legal doctrine of any jurisdiction in the United States and thus knowledge (or ignorance) of that content does not necessarily translate to knowledge (or ignorance) of relevant legal doctrine for a practicing lawyer of any jurisdiction; (b) the tasks involved on the bar exam, particularly multiple-choice questions, do not reflect the tasks of practicing lawyers, and thus mastery (or lack of mastery) of those tasks does not necessarily reflect mastery (or lack of mastery) of the tasks of practicing lawyers.

To the extent that one believes the UBE to be a valid proxy for lawyerly competence, we are going to show that the results suggest GPT-4 to be substantially less lawyerly competent than previously assumed, as GPT-4’s score against likely attorneys (i.e. those who actually passed the bar) is ∼48th percentile. When just looking at the essays, which more closely resemble the tasks of practicing lawyers and lawyerly competence, GPT-4’s performance falls in the bottom ∼15th percentile.

The findings of this research/presentation carry timely insights for the desirability and feasibility of outsourcing legally relevant tasks to AI models, as well as for the importance for AI developers to implement rigorous and transparent capabilities evaluations to help secure safe and trustworthy AI.

Start Date: