Your Honor
Publish date 20-01-2026

In recent years, Artificial Intelligence has increasingly entered the decision-making processes that govern collective life, driven by the belief that the use of algorithms makes decisions more objective and consistent. However, a recent study published in the prestigious academic journal PNAS by a group of researchers from Harvard, Yale, and Carnegie Mellon shows that this is not always the case. The paper, Does AI Help Humans Make Better Decisions? (Ben-Michael et al.), aims to empirically assess whether and when Artificial Intelligence improves the quality of human decision-making.
The authors apply a causal evaluation methodology to an intervention carried out within the U.S. judicial system, where judges were, for a period, assisted by an algorithm known as the Public Safety Assessment (PSA). This program assigns each defendant a risk score based on data related to age and criminal history, estimating the probability that the individual will commit a new offense or fail to appear in court.
The hypothesis was that AI could help judges make more accurate and uniform decisions. The results, however, contradict this expectation. The decisions of judges who received algorithmic recommendations were no more accurate than those of their colleagues who decided independently. On the contrary, the AI more frequently recommended pretrial detention, increasing cases of unnecessary incarceration. In essence, human–machine collaboration did not reduce error but rather shifted it: less uncertainty and perhaps greater speed, but at the cost of increased severity.
From a methodological standpoint, the authors propose a “counterfactual” statistical approach that allows for a rigorous comparison of three decision-making systems—human, human-assisted (hybrid), and purely algorithmic—even when it is not possible to observe all decision outcomes. More specifically, the authors compare the three systems by evaluating their classification ability, that is, their capacity to correctly distinguish cases in which detention is necessary from those in which it is not. However, because the consequences of a decision are observed only for individuals who experience that decision, it is never possible to know with certainty what would have happened had the opposite choice been made (the so-called counterfactual).
The approach proposed by the authors nevertheless makes it possible to overcome this structural limitation through an experimental design in which the provision of AI recommendations to judges is randomized, that is, assigned at random and unknown to the defendant. In this way, whether a judge receives the algorithm’s suggestion does not depend on the defendant’s characteristics. The differences observed between the two groups can therefore be interpreted as causal effects of AI support, even though not all counterfactuals can be directly observed.
But the core of the study is conceptual and opens the door to broader normative questions. The issue is no longer whether AI can be “fairer” than humans, but rather what we mean by a “better decision.” The study by Ben-Michael and colleagues does not reject technology, but challenges the idea that efficiency is equivalent to wisdom. Artificial Intelligence can be a useful support, but its effectiveness depends on how it is integrated into the human decision-making process and, above all, on the decision-maker’s ability to understand its limitations. If used uncritically, it risks becoming an opaque filter that reduces complexity to a sequence of probabilities, flattening nuances and reinforcing existing forms of discrimination.
The challenge, for both research and institutions, is therefore not to pit humans against machines, but to integrate their capabilities in a way that produces added value, knowledge, and inclusive well-being, while preserving intention, understanding, and responsibility—the dimensions that remain (for now) distinctly human in every decision.
Pierluigi Conzo
NP November 2025




