People open up more to smart speakers that listen actively

By Matt Swayne, '05 Journ

A sleek circular speaker with a digital clock display sits on a desk. — A team of researchers suggest that smart speakers — like Alexa and Siri — that demonstrate active listening cues might be able to serve a therapeutic role for users. (Credit: Brandon Romanchuk/Unsplash)

Adding random, short expressions of understanding in a conversation may turn smart speakers, such as Alexa and Siri, into robot therapists that allow people to open up more without violating their privacy, according to a team of researchers.

In a study, the team programmed a smart speaker to respond with backchannels — short words and non-words, such as “hmmm,” “right” and “uh-huh” — intermittently during chats with users. These utterances served as the sonic equivalent of head nods and facial gestures leading users to think that the virtual assistant was actually listening to them — or active listening — and prompted more emotional disclosure, said the researchers.

Because users perceived the smart speakers as good listeners and prompted more self-disclosure, the speakers could serve as emotional support devices for people, especially isolated people, said Saeed Abdullah, assistant professor of information sciences and technology and affiliate of the Institute of Computational and Data Sciences.

The users also tended to respond with more positive words when they interacted with the devices, which lends additional support to the idea that smart-speaker interactions could have a beneficial therapeutic effect, said Abdullah.

“This represents an initial step in terms of rethinking or reimagining the types of interactions that can be supported through smart speakers and other voice interfaces,” said Abdullah. “We are quite interested in figuring out if we can use smart speakers to provide some sort of therapeutic support for individuals with mental illnesses, or even individuals who might be socially isolated, which is a growing concern for mental healthcare professionals.”

According to the researchers, the ability to signal backchanneling at random is a win-win because it can promote the perception of active listening while maintaining the user’s privacy. Coding a smart-speaker tool that actually listens and responds to a person would require the device to take in information from the user, which might be uncomfortable for some users, according S. Shyam Sundar, James P. Jimirro Professor of Media Effects in the Donald P. Bellisario College of Communications and co-director of the Media Effects Research Laboratory at Penn State.

“In some ways, this is more socially responsible because we are doing it in a privacy-protected way,” said Sundar, who worked with Abdullah on the study. “We are not collecting any data, but still giving people the social cue that they need for them to continue with their disclosure and benefit from the cathartic aspect of the process.”

The approach could be both helpful and ethical as long as companies and designers avoid false advertising about its abilities, he added. For example, companies should not fool users into thinking the smart speaker is really listening to them, said Sundar, who is also director of Penn State Center for Socially Responsible Artificial Intelligence (CSRAI).

Random backchanneling also solves some significant technical issues, added Abdullah.

“One technical challenge is that it’s quite difficult to understand when to provide those backchanneling reactions because it requires lots of computational processing to understand the context,” said Abdullah.

The researchers recruited 40 participants for the study. The participants were randomly assigned to two different experimental conditions — one condition featured a smart speaker app programmed for random backchanneling and a control condition that did not offer any backchanneling. The participants were asked to interact with the smart speaker to express their thoughts regarding certain personal life matters.

Following the interaction, the participants completed questionnaires to determine their perception of active listening, emotional states, perceived emotional support from the speaker and the perceived quality of the smart speaker’s backchanneling ability.

To measure self-disclosure, the researchers recorded the time that users spent interacting with the smart speaker and the number of emotional words they used.

The team also included Eugene Cho, first author of the paper and assistant professor of communications, the College of New Jersey and Nasim Motalebi, doctoral candidate in information sciences and technology.

The researchers published their work in the ACM Proceedings of Human-Computer Interaction and presented it at the recent annual conference on Computer-Supported Cooperative Work And Social Computing.