AI's Next Big Step: Detecting Human Emotion and Expression

notion image
The AI field has made remarkable progress with incomplete data. Leading generative models like Claude, Gemini, GPT-4, and Llama can understand text but not emotion. These models can't process your tone of voice, rhythm of speech, or emphasis on words. They can't read your facial expressions. They are effectively unable to process any of the non-verbal information at the heart of communication. And to advance further, they'll need to learn.
Though much of the AI sector is currently focused on making generative models larger via more data, compute, and energy, the field's next leap may come from teaching emotional intelligence to the models. The problem is already captivating Mark Zuckerberg and attracting millions in startup funding, and there's good reason to believe progress may be close.
"So much of the human brain is just dedicated to understanding people and understanding your expressions and emotions, and that's its own whole modality, right?" Zuckerberg told podcaster Dwarkesh Patel last month. "You could say, okay, maybe it's just video or an image. But it's clearly a very specialized version of those two."
One of Zuckerberg's former employees might be the furthest along in teaching emotion to AI. Alan Cowen, CEO of Hume AI, is a former Meta and Google researcher who's built AI technology that can read the tune, timber, and rhythm of your voice, as well as your facial expressions, to discern your emotions.
As you speak with Hume's bot, EVI, it processes the emotions you're showing — like excitement, surprise, joy, anger, and awkwardness — and expresses its responses with 'emotions' of its own. Yell at it, for instance, and it will get sheepish and try to diffuse the situation. It will display its calculations on screen, indicating what it's reading in your voice and what it's giving back. And it's quite sticky. Across 100,000 unique conversations, the average interaction between humans and EVI is ten minutes long, a company spokesperson said.
"Every word carries not just the phonetics, but also a ton of detail in its tune, rhythm, and timbre that is very informative in a lot of different ways," Cowen told me on Big Technology Podcast last week. "You can predict a lot of things. You can predict whether somebody has depression or Parkinson's to some extent, not perfectly… You can predict in a customer service call, whether somebody's having a good or bad call much more accurately."
Hume, which raised $50 million in March, already offers the technology that reads emotion in voices via its API, and it has working tech that reads facial expressions that it has yet to release. The idea is to deliver much more data to AI models than they would get by simply transcribing text, enabling them to do a better job of making the end user happy. "Pretty much any outcome," Cowen said, "it benefits to include measures of voice modulation and not just language."
Text is indeed a lacking communication medium. Whenever anything gets somewhat complicated in text interactions, humans tend to get on a call, send a voice note, or meet in person. We use emojis or write things like "heyy" in a text to connote some emotion, but they have their limits, Cowen said. Text is a good way to convey complex thoughts (as we're doing here, for instance) but not to exchange them. To communicate effectively, we need non-verbal signals.
Voice assistants like Siri and Alexa have been so disappointing, for instance, because they transcribe what people say and strip all emotion out when digesting the meaning. Generative AI bots' ability to deliver quality experiences in their current form is notable, but it also shows how much better they can get, given how much information they lack.
To program 'emotional intelligence' into machine learning models, the Hume team had more than 1 million people use survey platforms and rate how they're feeling, and connected that to their facial expressions and speech. "We had people recording themselves and rating their expressions, and what they're feeling, and responding to music, and videos, and talking to other participants," Cowen said. "Across all of this data, we just look at what's consistent between different people."
Today, Hume's technology can predict how people will respond before it replies, and uses that to modulate its response. "This model basically acquires all of the abilities that come with understanding and predicting expression," Cowen said. "It can predict if you're going to laugh at something — which means it has to understand something about humor that it didn't understand before — or it can predict if you're going to be frustrated or if you're going to be confused."
The current set of AI products has been understandably limited given the incomplete information they're working with, but that could change with emotional intelligence. AI friends or companions could become less painful to speak with, even as a New York Times columnist has already found a way to make friends with 18 of them. Elderly care, Cowen suggested, could improve with AI that looks out for people's everyday problems, and is also there as a companion.
Ultimately, Cowen's vision is to build AI into products, allowing an AI assistant to read your speech, emotion, and expressions, and guide you through the experience. Imagine a banking app, for instance, that takes you to the correct pages to transfer money, or adjusts your financial plan, as you speak with it. "When it's really customer service, and it's really about a product," Cowen said, "the product should be part of the conversation, should be integrated with it."
Increasingly, AI researchers are discussing the likelihood of slamming into a resource wall given the limits on the amount of data, compute, and energy they can throw at the problem. Model innovation, at least in the short term, seems like the most likely way to get around some of the constraints. And while programming emotional intelligence into AI may not be the exact way to advance the field, it should have a chance. And it shows a way forward, toward building deeper intelligence into this already impressive technology.
Big Technology Talks: We’re in NYC on Wednesday, 5/15 with Box CEO Aaron Levie. Doors open at 6 p.m., live podcast at 6:45. Open to paid Big Technology subscribers. Apply here to join:

What Else I’m Reading, Etc.

Nantucket’s first Cybertruck had a bad day [Nantucket Current]
Laid off Tesla worker shares his story [LinkedIn]
AI-generated junk journalism is filling the web [Futurism]
Political campaigns in India resurrect the dead via AI [Rest of World]
Apple made a bad ad that got a lot of people upset [Washington Post]
The $7,000 Herman Miller Eames lounge chair is tech’s new status symbol [Business Insider]
In the courtroom with Stormy Daniels [New York]
Man reflects on a year taking Ozempic [New York Times]

Listen to Tools and Weapons with Brad Smith (sponsor)

notion image
Tools and Weapons, the podcast hosted by Microsoft Vice Chair and President Brad Smith, takes listeners into the center of the conversation surrounding AI. Featuring conversations with prime ministers and CEOs, as well as Brad’s inimitable perspective forged at one of the foremost companies in tech, the podcast is an essential listen for those looking to see emerging technology from all sides. Find Tools and Weapons wherever you get your podcasts.
Advertise on Big Technology?
Reach 165,000+ plugged-in tech readers with your company’s latest campaign, product, or thought leadership. To learn more, write alex@bigtechnology.com or reply to this email.

Quote Of The Week

I would be surprised if LLMs are the only things we need to make progress. We are investing a lot of computing and resources, our AI researchers' talent, in driving the next generation set of breakthroughs.
Google CEO Sundar Pichai on whether LLMs are about to plateau.

Number of The Week

$100 billion+
Market opportunity Google is pursuing in bioscience after its latest AlphaFold release, according to Google DeepMind CEO Demis Hassabis

This Week on Big Technology Podcast: Economics of OpenAI, Tesla’s Robotics Pivot, Hedonic Treadmill — With Slate Money

notion image
Felix Salmon, Emily Peck, and Elizabeth Spiers are the hosts of the Slate Money podcast. They join Big Technology to discuss the economics and societal implications of artificial intelligence and robotics. Tune in to hear their nuanced take on the costs, challenges, and potential paths forward for companies like OpenAI and Tesla as they pursue ambitious goals AI and robotics. We also cover the realities of retirement in modern economies and the ongoing debate over raising retirement ages. Join us for a thought-provoking conversation at the intersection of tech, business, and society, featuring experts who aren't afraid to challenge assumptions and dive deep into the details.
Thanks again for reading. Please share Big Technology if you like it!
And hit that Like Button and enjoy the the human monopoly on emotion while you can!
My book Always Day One digs into the tech giants’ inner workings, focusing on automation and culture. I’d be thrilled if you’d give it a read. You can find it here.
Questions? News tips? Email me by responding to this email, or by writing alex@bigtechnology.com Or find me on Signal at 516-695-8680