Study Reveals AI’s Inability to Replicate Human Judgment, Leading to Stricter Rule Violations

According to a recent study carried out by researchers from prestigious institutions such as MIT, machine-learning models utilized in decision-making processes regarding rule violations are experiencing difficulties in emulating human judgment.

The study reveals that without training with appropriate data, their assessments frequently diverge from humans. In fact, they often result in excessively severe judgments.

What’s the Main Issue?

The primary challenge lies in the training data used for machine-learning models. Typically, this data is labeled with descriptive information, requiring humans to identify factual attributes. To illustrate, when assessing whether a meal violates a school policy prohibiting fried food, humans have to identify the presence of fried food in a photograph.

However, when these descriptive models evaluate rule violations, they tend to excessively predict such violations. This decline in accuracy carries significant implications. For instance, consider a scenario where a descriptive model assesses the likelihood of an individual committing another offense. The study suggests that machine-learning models could potentially render more stringent judgments than humans in such cases. This, in turn, could result in higher bail amounts or longer sentences for offenders.

Marzyeh Ghassemi, an assistant professor and leader of MIT’s Healthy ML Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), highlights that the models’ inability to replicate biased human judgments stems from inherent flaws in the training data itself. If humans were aware that their labels would be used for making judgments, they would likely assign different labels to images and text. This bears significant implications for machine learning systems integrated into human-driven processes.

To read more interesting tech news and useful tips, check out the websites TheBusinessUp and techmagazines.

Labeling Discrepancy

To delve deeper into the labeling disparity between descriptive and normative labels, the research team conducted a user study. They curated four datasets representing various policies and enlisted participants to provide either descriptive or normative labels.

The findings indicated that in the descriptive setting, humans tended to label an object as a violation more frequently. The divergence in labeling ranged from 8 percent for dress code violations to 20 percent for dog images. To gain further insights into the ramifications of utilizing descriptive data, the researchers trained two models. One model was trained using descriptive data, while the other utilized normative data, both aimed at judging rule violations.

The analysis revealed that the model trained with descriptive data exhibited inferior performance compared to the model trained with normative data. The descriptive model showcased a higher likelihood of misclassifying inputs, particularly in terms of inaccurately predicting rule violations. Notably, its accuracy experienced a substantial decline when confronted with objects that elicited disagreements among human labelers.


Dataset transparency is crucial for responsible and trustworthy AI research. It enables researchers to understand the strengths and limitations of the data used for training machine learning models, helps identify and mitigate biases, promotes reproducibility, and facilitates collaboration within the research community. By prioritizing dataset transparency, we can ensure the development and deployment of AI models that are fair, ethical, and aligned with societal values.

Dataset transparency enables the identification and mitigation of biases and limitations inherent in the data. Transparency also fosters reproducibility and collaboration in the AI research community. When people document and share datasets transparently, other researchers can replicate experiments, validate findings, and build upon existing work. This contributes to the advancement of knowledge and ensures the integrity of scientific research.


Why Dataset Transparency is Important in AI Research?

Enhancing dataset transparency is vital for several reasons. First, it allows researchers to assess the quality and representativeness of the data used for training. By having insights into how the dataset was curated, including the criteria for labeling and potential sources of bias, researchers can make more informed decisions about its suitability for their specific tasks.

How can AI researchers promote more dataset transparency?

To promote dataset transparency, researchers and organizations should adopt best practices. This includes providing detailed documentation of data collection methodologies, annotation processes, and any preprocessing steps applied to the dataset. Additionally, sharing the dataset itself, or a representative subset of it, can facilitate independent evaluation and verification.