In May I attended the all-virtual Women in Voice (WiV) 2022 Summit and participated in a panel discussion titled “Meet Me Where I'm At—Conversational AI Across Devices.” It was a thoughtful and thought-provoking discussion around conversational AI’s challenges, impacts, and future, and so I wanted to share my impressions here.
Feel free to watch the 36-minute recorded webcast instead. It’s pretty lively!
I was honored to be on this panel of amazing women:
Full disclosure: I’m the founder of WiV Canada and 7.ai was a WiV Summit sponsor.
Kris, who focuses on voice and emerging platforms, expressed it was hard to wrap her mind around what the user’s actual workflow experience would be like. “Even though I knew technically what would work,” she said, “I had many assumptions as a developer that kept me from really thinking about how it would all play out in real life. I ditched those assumptions over time, which made me a much better developer.”
Lisa said she’s learned the key to successful conversation design is understanding the context and the customer’s goals. “Some Alexa interactions are highly transactional, and some are more delightful,” she said, “like when you ask it for a joke or you want to talk about the latest celebrity news. Where you start sets the frame for where you’re going and how you get there.”
My top learnings were stereotypical things like the myriad ways humans think and talk, how we must analyze every word of every interaction every time to get to conversations that work. Conversational AI has to take a lot into account. Like demographics: certain population segments are uncomfortable with natural language but quite happy to Press One for this and Press Two for that. So we adapt.
7.ai has a lot of enterprise clients, and their goals drive our conversation design. For example, in some situations our clients are hoping to boost customer satisfaction, and other times they’re trying to lower costs—we’ll use language and create pathways that specifically support those outcomes and we’ll establish appropriate metrics to measure our success.
Lisa noted how different domains and environments need to be reflected in the design because of how each one impacts the user experience. “A lot depends on the device,” she said. “People may talk to an Echo device in their home, or in the car through a headset. Home devices are communal, used by kids and parents. The possible contexts are multiplying, and all those interactions are different.”
Lisa continued: “One of the reasons I like working in conversation design is I get to see how these different form factors influence how people speak and what grabs their attention. My perspective has shifted quite a bit as the devices themselves and the access points have changed over the years.”
Kris said she was really excited about the possibility for greater multimodality. “It would be cool to enhance talk and touch features for people who have trouble interacting with devices in the usual ways, because of an impairment or whatever,” she said, “so they don't have to necessarily answer a question definitively to get to the next thing. A lot of that is coming online now, such as a camera that wakes just when you look at it. Very exciting but maybe a little creepy too.”
Lisa picked up on that theme: “People are naturally multimodal and there's an escalation path, right?” she said. “If I call someone's name and they don't answer, then I wave at them. If that doesn’t get their attention, I’ll walk over and tap them on the shoulder. So it makes sense to create accessibility strategies for people who can’t or don’t want to exercise a particular sense. One of my favorite things in our Echo Show is enabling visually impaired users to show the device a can or a box, and it uses image recognition to let them know they’re holding chicken noodle and not beef vegetable soup, and so on.”
Our team at 7.ai looks at conversational experience from both the user and company perspective. What’s apparent based on our optimization analysis of all the data, and this seems to accord with most people’s lived experience, is that people’s patience is thinning and they're expecting more human-like, more personal engagement right off the bat. So, your AI must know who the customer is, where they are in their journey, and their intent. If it does, maybe they’ll talk to you. If not, you’ve lost them for sure.
There is a lot of channel jumping from voice to digital and back again, which speaks to the asynchronous side of things. For example, I was looking for “soil delivery” the other night—long story—and this one company had only a phone number. Sorry, but I need to be able to send you a message that you can respond to when you’re open, or at least give me an automated response. They didn’t get my business.
Upshot: Personalization and asynchronous communications have permanently changed customer expectations and the way we interact with brands.
Conversational AI emerging trends is one of my favorite topics. For example, the “Be Right Back” episode of Black Mirror told the story of a woman talking to the AI of a deceased family member. This is not really science fiction anymore. Today there are companies working on enabling you to “talk” to the deceased by feeding an individual’s social media transcripts and other data-rich sources through machine learning to create an interactive database. It's not quite there yet, but it’s definitely coming. Super cool, but very creepy. The ethical and philosophical considerations are almost overwhelming. Of course, what challenges me more practically is making those interactions more human-like so people would even consider having them.
Lisa is a huge fan of this space and, to my surprise, she is working on her own version! Caveat: She’ll share it only with her family, not publicly.
Maddy jumped in to say how impressed she is with text-to-speech engines. “The neural TTS quality is just unbelievable, the way you can tune it.”
“There's some really interesting synthetic work being done by our friends at Vocal ID, for example,” said Lisa. “In the next 10 to 15 years, custom text-to-speech is going to become much more prevalent, where you can create original utterances that sound like my voice from way fewer recordings than was possible before. Voice talent used to sit in a sound booth for days and record from a script. Now you can create custom text-to-speech in 10 minutes. What are the implications?”
Lisa brought up the use of other modalities, like gesture signage, to expand beyond voice. “I talk with my hands a lot,” she said, “maybe there is some expression recognition there as well.” She noted she was interested in conversational AI’s ability to recognize emotion and to do things like detect speech degradation, indicating cognitive decline over time or even a stroke. And of course, we already use emotional recognition for escalation purposes in customer care; there's a difference between someone calmly asking for their balance versus someone whose credit card was stolen needing to speak to someone right away.
One reason we still do call listening today is voice is just so rich compared to digital transcripts, even though call listening takes a lot more time. You can hear so much more emotion, and we get so many additional recommendations for improving our AI applications this way.
Of course, we brought up what 7.ai does best: complementing the AI tech with the human insight, or HI, tech, bringing the two worlds together to have the best possible experience for customers.
Overall, I felt so lucky to be part of this panel of powerhouse women who know the conversation design space so well. It felt like we had met in a coffee shop for a chat back in the days before COVID. I thank them and Women in Voice for their time and expertise as they continue to bring great minds together for collaboration and innovation!
Join the community of 1,000+ global Women in Voice and their allies! Sign up for free WiV membership.
For all the panel discussion details and to see the bigger picture, watch the recorded webcast and visit the Women in Voice (WiV) 2022 Summit web page.
And check out these resources!