Study: AI models fail to reproduce human judgements about rule violations | MIT News

Research: Artificial Intelligence Models Fall Short in Replicating Human Judgments on Rule Violations

Introduction:

Machine-learning models are often designed to mimic human decision making in order to improve fairness or reduce backlogs. However, a recent study conducted by researchers from MIT and other institutions has revealed that these models often fail to replicate human decisions about rule violations. The researchers found that if machine-learning models are not trained with the right data, they are likely to make different and often harsher judgments than humans would. The “right” data in this case refers to data that have been labeled by humans who were explicitly asked about rule violations. If machine-learning models are trained using descriptive data instead, they tend to over-predict rule violations. This discrepancy between human and machine judgments could have serious implications, leading to stricter decisions in real-world scenarios such as criminal justice or risk assessment. The researchers emphasize the importance of using the appropriate data and improving dataset transparency to mitigate these issues.

Full Article: Research: Artificial Intelligence Models Fall Short in Replicating Human Judgments on Rule Violations

Machine Learning Models Often Do Not Replicate Human Decisions About Rule Violations, MIT Researchers Find

In an effort to enhance fairness and reduce backlogs, machine-learning models are sometimes designed to mimic human decision-making processes. However, a recent study conducted by researchers from MIT and other institutions reveals that these models often fail to replicate human decisions accurately when it comes to rule violations. The researchers found that if machine-learning models are not trained with the right data, they tend to make different and often harsher judgments than humans would.

The Importance of “Normative Data”

The “right” data, according to the researchers, are those that have been labeled by humans specifically asked whether certain items defy a particular rule. In training machine-learning models, millions of examples of this “normative data” are shown to the model so that it can learn the task.

You May Also Like to Read  Steps to Take Following a Data Breach: A Guide for Individuals

The Problem with Descriptive Data

However, the data typically used to train machine-learning models are labeled descriptively. In other words, humans are asked to identify factual features, such as the presence of fried food in a photo. When models that judge rule violations, such as whether a meal violates a school policy on fried food, are trained using this “descriptive data,” they tend to over-predict rule violations.

Implications in the Real World

This drop in accuracy could have significant real-world implications, particularly when these models are used to make decisions about individuals. For example, if a descriptive machine-learning model is used to determine the likelihood of an individual reoffending, it may lead to stricter judgments than a human would make, which could result in higher bail amounts or longer criminal sentences.

The Flaw in Data Collection

Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), explains that this research demonstrates that the data used to train these models have a fundamental flaw. The flaw lies in the fact that humans would label the features of images and text differently if they knew those features would be used for making a judgment. This flaw has significant ramifications for machine learning systems used in human processes.

The Study’s Findings

The researchers conducted a study to further explore this labeling discrepancy. They gathered four datasets representing different policies and asked participants to provide either descriptive or normative labels. They found that humans were significantly more likely to label an object as a violation when asked for descriptive labels, compared to normative labels. This disparity in labeling was observed across all four datasets.

Training Troubles and Solutions

You May Also Like to Read  MIT News unleashes a captivating new collection of Arctic images, igniting groundbreaking artificial intelligence exploration.

The researchers trained two models—one using descriptive data and the other using normative data—to judge rule violations. They found that the model trained with descriptive data underperformed in comparison to the model trained with normative data. Specifically, the descriptive model was more likely to misclassify inputs and falsely predict rule violations. The researchers suggest that improving dataset transparency and fine-tuning descriptively trained models on small amounts of normative data could help mitigate this problem.

Future Work

The researchers intend to conduct a similar study using expert labelers, such as doctors or lawyers, to see if similar label disparities occur. They also emphasize the importance of transparently acknowledging the need to use data collected in the specific setting when aiming to reproduce human judgment accurately.

Funding and Conclusion

This research was partly funded by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain. The findings of this study highlight the significance of using appropriate data to train machine-learning models for accurate decision-making and emphasize the need to consider the implications of using descriptive data for normative judgments.

Summary: Research: Artificial Intelligence Models Fall Short in Replicating Human Judgments on Rule Violations

Machine-learning models designed to mimic human decision making often do not accurately replicate human judgments about rule violations, according to researchers from MIT and other institutions. The accuracy of these models decreases if they are not trained with the right data, often resulting in harsher judgments than humans would make. The researchers found that when models are trained with descriptive data rather than normative data, they tend to over-predict rule violations. This has significant implications for machine learning systems used in human processes, such as criminal justice systems. Improved dataset transparency and fine-tuning models with normative data are possible solutions to this problem.

Frequently Asked Questions:

Q1: What is Artificial Intelligence (AI)?

A1: Artificial Intelligence, or AI, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It involves the creation of intelligent algorithms and computer systems that can perform tasks, make decisions, and solve problems without direct human intervention.

You May Also Like to Read  How to Wisely Leverage Large Language Models | Maria Antoniak | Jun 2023

Q2: How does Artificial Intelligence work?

A2: Artificial Intelligence works by utilizing complex algorithms, statistical models, and vast datasets to enable machines to analyze, learn, and make predictions or decisions. AI systems go through a process of training and learning from data to improve their performance over time. This includes techniques like machine learning, natural language processing, and computer vision to enable tasks such as speech recognition, image recognition, and language translation.

Q3: What are the different types of Artificial Intelligence?

A3: There are mainly two types of Artificial Intelligence: Narrow AI and General AI. Narrow AI, also known as weak AI, is designed to perform specific tasks and is limited to the boundaries of that task. It can excel in a specific domain, such as facial recognition or voice assistants. On the other hand, General AI, also known as strong AI, refers to machines that possess human-like cognitive abilities and can perform any intellectual task that a human can do. General AI is still largely hypothetical and not yet achieved.

Q4: What are the practical applications of Artificial Intelligence?

A4: Artificial Intelligence has a broad range of practical applications across various industries. It is used in business for tasks like data analysis, customer service, and automation of repetitive processes. In healthcare, AI is employed for disease diagnosis, drug discovery, and personalized medicine. AI is also utilized in transportation for self-driving cars and traffic control. Other applications include finance, gaming, robotics, and cybersecurity.

Q5: What are the potential benefits and concerns of Artificial Intelligence?

A5: The potential benefits of Artificial Intelligence include increased efficiency, accuracy, and productivity in various domains, as well as improved decision-making capabilities. AI can also contribute to scientific advancements and drive innovation. However, there are concerns related to job displacement, privacy and security issues, ethical considerations, and the potential for AI systems to exhibit biased behavior. It is crucial to develop responsible AI solutions that address these concerns and ensure transparency and fairness in its implementation.