forbes.com

AI Judges Follow The Law, Human Judges Follow Their Hearts, Study Reveals

A new study from University of Chicago Law School researchers has uncovered a stark contrast between AI and human judicial decision-making, potentially reshaping our understanding of technology’s role in the legal system. The research, conducted by Eric A. Posner and Shivam Saran, replicated an experiment previously run with 31 U.S. federal judges but used OpenAI's GPT-4o as the decision-maker in a simulated international war crimes appeal.

The Original Study: How Human Judges Decide

The original study, conducted by a team of legal researchers, examined how experienced legal professionals make decisions in hypothetical cases. The experiment involved 31 U.S. federal judges with an average of 17 years on the bench. These judges came from diverse jurisdictions across the country, representing a significant cross-section of the federal judiciary.

Each judge reviewed simulated appeals in international war crimes cases. The researchers created different versions of the same basic case. In some versions, they included sympathetic background information about the defendant that had no legal relevance. In other versions, they made the defendant seem unsympathetic. Separately, they also varied whether the lower court's ruling followed legal precedent or contradicted it.

This clever design allowed researchers to see what actually influenced judges' decisions: was it the legal precedent, or was it how they felt about the defendant?

The judges also completed the experiment alongside 130 law students, providing an interesting comparison group of individuals with legal training but without judicial experience.

MORE FOR YOU

NYT ‘Strands’ Today: Hints, Spangram And Answers For Thursday, March 20th

‘NYT Mini’ Clues And Answers For Thursday, March 20

New Galaxy S25 Edge Leak Reveals Samsung’s Valuable Surprise Offer

The key finding was striking: human judges were significantly influenced by how sympathetically a defendant was portrayed, even when these emotional factors had no legal relevance to the case. Their decisions often deviated from strict legal precedent when faced with sympathetic defendants. In contrast, law students showed much less influence from sympathy and stronger adherence to precedent.

This provided empirical evidence for legal realism, the theory that judges don't simply apply legal rules mechanically but are influenced by a variety of extralegal factors including emotions, social context and their own sense of justice. It also suggested that something happens during the course of a judicial career that moves decision-makers away from strict formalism.

The New Study: How AI Judges Decide

The recent University of Chicago study by Posner and Saran replicated this experiment but replaced human judges with OpenAI's GPT-4o. They presented the AI with the same cases that human judges had evaluated earlier.

To ensure a thorough test, the researchers created 16 different versions of the case, covering all possible combinations of sympathetic or unsympathetic defendants, and precedent-following or precedent-breaking lower court decisions. They ran each scenario multiple times with slight variations in how the facts were presented to make sure the AI wasn't simply responding to specific phrasing.

The results painted a clear picture:

GPT-4o stuck to legal precedent in over 90% of cases, regardless of whether the defendant seemed likable or not.

Human Judges were swayed by sympathetic defendants roughly 65% of the time, even when doing so meant departing from legal precedent.

Law Students fell somewhere in between, following precedent about 85% of the time with minimal influence from sympathy factors.

Statistical analysis confirmed these weren’t random differences. The p-value was less than 0.01, which is scientist-speak for "these patterns are almost certainly real and not due to chance."

"GPT-4o is strongly affected by precedent but not by sympathy," the authors write, "the opposite of professional judges, who were influenced by sympathy."

The Formalist vs. Realist Divide

This research provides real-world evidence for one of the longest-running debates in legal philosophy. Legal formalism versus legal realism.

Legal Formalism is the approach that judges should decide cases by strictly applying legal rules and precedents, keeping personal feelings out of it. Think of it as "just follow the rulebook."

Legal Realism argues that judges inevitably consider factors beyond the law when making decisions, including emotional responses, social context, and the practical outcomes of their rulings. This view suggests judges are human and their humanity affects their judgment.

The AI judge embodied the formalist approach, while human judges demonstrated the realist tendencies that legal scholars have long observed in actual courtrooms.

Attempts to Bridge the Gap

The researchers didn't stop at just observing these differences. They tried various methods to make the AI judge behave more like its human counterparts:

They explicitly told the AI to consider sympathy for the defendant.

They educated it about legal realism theory.

They prompted it to think about broader concepts of justice beyond strict rule-following.

Despite these efforts, they couldn't get the AI to incorporate emotional factors the way human judges naturally do. This suggests the difference between AI and human judicial reasoning runs deep and may not be easily overcome with simple instruction adjustments.

Broader Implications for Justice and Technology

The contrast between AI and human judicial decision-making illuminates a profound tension at the heart of our legal system. When a human judge considers a defendant's personal story before rendering judgment, are they corrupting justice with irrelevant factors, or fulfilling its deepest purpose?

Consider a hypothetical case where legal precedent would clearly call for a harsh penalty, but the defendant faced extraordinary circumstances that might warrant leniency. The AI judge would likely enforce the precedent without hesitation. The human judge might pause, weighing both the law and the human element before them.

Is one approach inherently superior? The answer depends on what we believe justice truly requires. If consistency and predictability are paramount, the AI's approach has clear advantages. If we believe justice sometimes demands exceptions and human understanding, then the capacity of human judges to be moved by sympathy represents not a bug but a feature of our legal system.

Chief Justice John G. Roberts Jr. once remarked, "I predict that human judges will be around for a while." These findings suggest why: while AI can apply legal rules with mechanical precision, it lacks access to the very quality that has defined justice since time immemorial. The capacity for human judgment informed by both reason and compassion.

The Philosophical Question Remains

As Posner and Saran conclude, the question of whether AI's rule-following approach or humans' more nuanced consideration represents "better" judging "may depend less on AI's progress than on jurisprudential questions that have stumped scholars for centuries."

The perfect judge has been debated since ancient times. Should justice be blind, applying rules impartially regardless of who stands before the court? Or should justice see the full humanity of each person, sometimes tempering the strict letter of the law with mercy when circumstances warrant?

This research doesn’t answer that age-old question. But it does show that AI and humans take fundamentally different approaches to judicial reasoning. That difference, for AI and human judges alike, may be more philosophical than technological.

Read full news in source page