Some artificial intelligence tools for health care could be confused by the way people of different genders and races speak, according to a new study led by Theodora Chaspari, a computer scientist at CU Boulder.
The study draws on a perhaps unspoken reality of human society: not everyone speaks in the same way. Women, for example, tend to speak at a higher pitch than men, while similar differences can emerge between white and black speakers, for example.
Researchers have found that these natural variations could disrupt algorithms that screen humans for mental health issues like anxiety or depression. The findings add to a growing body of research showing that AI, like humans, can make assumptions based on race or gender.
“If AI is not well trained or does not include enough representative data, it can propagate these human or societal biases,” said Chaspari, an associate professor in the Department of Computer Science.
She and her colleagues published their findings July 24 in the journal Frontiers in Digital Health.
Chaspari pointed out that AI could be a promising technology in the health field. Finely tuned algorithms can sift through recordings of people’s conversations, looking for subtle changes in the way they speak that could indicate underlying mental health issues.
But these tools must perform consistently across patients from many demographic groups, the computer scientist said. To determine whether AI was up to the task, the researchers fed audio samples from real humans into a common set of machine-learning algorithms. The results raised some red flags: The AI tools, for example, appeared to underdiagnose women who were at higher risk for depression than men — a finding that, in the real world, could prevent people from getting the care they need.
“With artificial intelligence, we can identify these precise patterns that humans can’t always see,” said Chaspari, who conducted the work as a faculty member at Texas A&M University. “However, while that opportunity exists, it also comes with a lot of risks.”
Speech and emotions
She added that the way humans speak can be a powerful window into their underlying emotions and well-being, something poets and playwrights have long known.
Research suggests that people with clinical depression often speak more quietly and in a more monotone voice than others. People with anxiety disorders, on the other hand, tend to speak in a higher pitch and with more “wheezing,” a measure of wheezing in speech.
“We know that speech is strongly influenced by anatomy,” Chaspari said. “For depression, some studies have shown changes in how vocal cord vibrations occur, or even how the voice is modulated by the vocal tract.”
Over the years, scientists have developed AI tools to look for precisely these types of changes.
Chaspari and his colleagues decided to put the algorithms to the test. To do so, the team relied on recordings of human conversations in different scenarios: In one, participants had to talk for 10 to 15 minutes with a group of strangers. In another, men and women had to chat for longer in a setting similar to a doctor’s visit. In both cases, the participants had to fill out separate questionnaires about their mental health. The study was conducted with Michael Yang and Abd-Allah El-Attar, undergraduates at Texas A&M.
Correcting biases
The results seemed to be everywhere.
In recordings of public speeches, for example, Latino participants reported feeling significantly more nervous on average than white or black speakers. The AI, however, failed to detect this increased anxiety. In the second experiment, the algorithms also flagged an equal number of men and women as being at risk for depression. In fact, women experienced symptoms of depression at much higher rates.
Chaspari stressed that the team’s findings are just a first step. Researchers will need to analyze recordings from more people across a wide range of demographic groups before they can understand why the AI made mistakes in some cases and how to correct those errors.
But, she said, the study is a sign that AI developers should proceed with caution before introducing AI tools into the medical world:
“If we think an algorithm is actually underestimating depression for a specific group, we need to inform clinicians.”