Last year I blogged about the creepy phenomenon of cyranoids. A cyranoid is a person who speaks the words of another person. With the help of a hidden earpiece, a ‘source’ whispers words into the ear of a ‘shadower’ , who repeats them. In research published last year, British psychologists Kevin Corti and Alex Gillespie showed that cyranoids are hard to spot: if you were speaking to one, you probably wouldn’t know it, even if the source was an adult and the shadower a child, or vice versa.
Now Corti and Gillespie are back with an even more striking experiment. In their new research, published in Frontiers in Psychology, they set up a scenario in which a human’s words were controlled by a computer chat-bot. They call this computerized variant of the cyranoid idea the echoborg. Here’s how it works:
In one room, a normal person (‘interactant’) sits down with another person, the ‘shadower’. The interactant begins the conversation (e.g. “What’s your name?”). A researcher in another room is listening in on what the interactant says, via a hidden microphone, and types the interactant’s words into a chat-bot program. The bot generates a text response (e.g. “My name is Kim”). The researcher then reads this response into a microphone, and the shadower listens to the response via a hidden earpiece. They then repeat (echo) what they hear. And so the conversation goes.
To meet an echoborg is to meet chat-bot, in other words – but without knowing it. As Corti and Gillespie put it, echoborgs “allow the possibility of studying social interactions with artificial agents that have truly human interfaces.”
So the authors conducted a study in which 41 adult volunteers met and conversed with a stranger. Unbeknownst to them, the stranger’s words were being controlled by a chat-bot (either Cleverbot, Rose, or Mitsuku). The conversation was conducted either via text chat, or face-to-face (i.e. an echoborg). The volunteers were not told about the presence of the chat-bot. They were simply told:
That the study concerned how strangers conversed when speaking for the first time, that it involved simply holding a 10-min conversation with another research participant, and that they were free to decide on topics for discussion so long as vulgarity was avoided. The researcher made no mention of chat bots or of anything related to artificial intelligence. Furthermore, the participant was given no indication that their interlocutor would behave non-autonomously or abnormally.
In post-conversation debriefing, it turned out that echoborgs were much less likely than text chats to be spotted as chat-bots:
In the Text Chat condition, 14 of 21 (67%) of participants mentioned (prior to the researcher making any allusion to chat bots or anything computer-related) that they felt they had spoken to a computer program or robot… only 3 of 20 participants (15%) in the Echoborg condition stated this.
However, despite this, most of the participants felt that something strange was going on when speaking with an echoborg. 15 of 20 participants said that “their interlocutor had been acting or giving scripted responses that did not align with their actual persona.” Some participants thought the true purpose of the study was “to see how people communicated with those who were shy / introverted”. Others thought that the study was about people with autism or a speech impairment.
In other words, while unsuspecting people are unlikely to guess that an echoborg is a chat-bot, they sense that they’re not a normal human being.
This may have been partly because of the fact that the echoborgs had very slow reactions. They paused before speaking, due to the time required for the researcher to type what they heard into the chat-bot and then speak the response out-loud to the shadower. The mean ‘audio latency’ was around 5 seconds per statement. Corti and Gillespie say that
Minimizing this latency is a major research priority as we continue to refine the echoborg methodology.
You can see the method in action (along with the latency) in a YouTube video of an echoborg conversation, uploaded by Corti and Gillespie:
I would say that an interesting comparison would be to see what unsuspecting people make of someone who speaks their own words, but who pauses for 5 seconds before saying anything. Maybe this ‘audio latency matched’ condition would not be perceived very differently from an echoborg?
Corti, K., & Gillespie, A. (2015). A truly human interface: interacting face-to-face with someone whose words are determined by a computer program Frontiers in Psychology, 6 DOI: 10.3389/fpsyg.2015.00634