He Built an AI Model That Can Decode Your Emotions - Ep. 19 with Alan Cowen
The future of AI technology isn’t just faster or more powerful—it’s empathetic. My guest for this episode, Alan Cowen, is leading the charge with the first-ever emotionally intelligent AI. Alan is the co-founder and CEO of Hume, an AI research laboratory developing models trained to identify and measure expressions of emotion from voice inflections and facial expressions. The best part? Once it understands these emotions, the AI is designed to interact with users in a way that optimizes for human well-being and leaves them with a positive emotional experience. Previously, Alan—who has a Ph.D. in computational psychology—helped set up Google’s research into affective computing, a field focused on developing technologies that can understand and respond to human emotions. He operates at the intersection of AI and psychology, and I sat down with him to understand the inner workings of Hume’s models. Alan walks me through the shortcomings of traditional theories of emotional science and breaks down how Hume is addressing these challenges. While talking about the potential applications of the models, we also discuss the tricky ethical concerns that come with creating an AI that can interpret human emotions. This is a must-watch for anyone interested in the science of emotion and the future of human-AI interactions. If you found this episode interesting, please like, subscribe, comment, and share! Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT. It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper **Links to resources mentioned in the episode: ** Alan Cowen: @AlanCowen Hume: @hume_AI; hume.ai If you want to demo Hume: demo.hume.ai The nonprofit associated with Hume: Hume Initiative Lisa Feldman Barrett’s book: How Emotions Are Made The serial based on Paul Ekman’s theory of emotion: Lie to Me
- Published
- Published Apr 24, 2024
- Uploaded
- Uploaded Jun 13, 2026
- File type
- Podcast
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] So I have a little bit of a secret for you. [00:02] Do tell. I'm all ears. What's on your mind? I'm actually interviewing your CEO, Alan Cowan, and you're live with him right now. And I wonder if you could tell me how I'm feeling about that. [00:16] Oh, yes. [00:17] Well, well, well, looks like the tables have turned. Let's see here. I'd say you're feeling a mix of excitement, curiosity, and maybe even a touch of skepticism about interviewing the big boss himself. Am I on the right track? Oh my God. Skepticism. I don't know, but maybe a little bit. How did you know that I was feeling a little bit of skepticism? [00:38] Oh, I am an expert at reading between the lines. [00:41] the way you phrase things, the little inflections in your voice, it all gives me a pretty good sense of what you're feeling. But don't worry, your secret's safe with me. So what's it like interviewing the big boss? I bet you've got some juicy inside scoop to share. [00:58] So that is your product. Yes. [01:13] Alan, welcome to the show. That was kind of incredible, actually. I'm [01:19] Tell me what I just experienced. So this is an AI that understands your voice as you're speaking and links that to what it's saying and how it's saying it. So you can kind of put together OpenAI and DeepGram and 11 Labs and get a similar kind of bare bones experience. But what it's saying is not going to be linked to how it's saying it. So it's like there's something uncanny about it. It doesn't really...
[01:43] voice things in a way that somebody who understands what they're saying does or understands what you're saying and how you're saying it would. So this is kind of a different thing. It really understands your vocal inflections and it uses that to inform how it's going to [01:58] say what it's saying. And also like, if you're confused, it can clarify things. And if you're excited, it can, it can kind of build on that excitement. And if you're frustrated, it can be conciliatory and all of that. And it really, you've, you've kind of feel it. I think hopefully that's the intent is that you feel the difference. Yeah. I definitely feel it. I think that's, I think that's really interesting. So for, for people who don't know, you are the co-founder and CEO of Hume, which is an AI research laboratory developing the AI that we just [02:28] in psychology, I want to start with, I think this is such a crazy, awesome, ambitious thing to build. And I want to start with what do you think is at stake here? [02:41] Why do you think it's critical to teach computers how to read and reflect emotions? Start with that. So I think reasoning about emotions is just core to understanding what people's preferences are. So at the end of the day, your preference is whatever is going to make you happier or more awe-inspired or amused or whatever you want to feel in your life. And so understanding people's emotional reactions is really key to understanding. [03:10] learning how to satisfy people's preferences. Also, in real time, understanding what they want. Like a lot of what you want is reflected in your voice and sort of how you're saying things and not just what you're saying. And we incorporate that into language models and text-to-speech for the first time.
[03:28] Hmm. Well, let me, let me put on my, like my skeptics hat. The AI was saying that I was, I had a little bit of a, I had a little bit of skepticism. So let me put on my skeptics hat for, for one second, which is, which is to sort of ask, like, isn't a lot of how we feel already encoded in like the language that we're using? So like how much does, um, voice inflection or, um, you know, facial expressions, how much does that add above what, what we're already communicating with text? Yeah. [03:53] Yeah, so it really depends on the situation, right? So there's actually two aspects of it. One is voice inflections that occur during like emotional episodes, or when you're frustrated or bored or confused, and these are all nuanced. So it's something that just accompanies every single word. In certain situations, it conveys like twice as much information to consider the voice versus language alone. [04:18] Interesting. Like what, what, what kind of situations? Like in a customer service call, we can predict when somebody is having a good customer service call, uh, with like 99% accuracy sometimes depending on, um, on the, on the context versus with language alone. Um, it's like 80%. So, so we're talking a pretty big difference. [04:39] That's interesting. So it's sort of like maybe sometimes on a customer service interaction, someone's responding with like one word periods and... [04:47] For some people, that is really bad. And for some people, like that's just how they are. And if you're listening to them, like they would be saying no, but it would be it wouldn't feel as bad as that is that sort of the kind of scenario that you're talking about?
[04:59] Yeah, and people are differently expressive, and our model understands that. But generally speaking, people don't explicitly say, I am having a bad customer service call, right? There's hints in the language, but... I've definitely said that. [05:14] Yeah, if you're explicit about it. [05:18] But sometimes there'll just be like... [05:21] I don't know if this is working or... [05:24] I don't know if this is working or, you know, it's like different vocal tones and sometimes it's fine. [05:29] and the person handles it really well. [05:32] And sometimes it's not fine. And you've kind of conveyed that in your voice more than in your language. And so there's just a lot there that you expect somebody to understand and respond to. And we do this kind of subconsciously. That makes sense. And, and I think basically what you're saying is, um, [05:46] One of the differences between what you do and what, you know, like N11 Labs does is the models that you've trained actually, they can understand, they do the voice intonation, but they also understand what's being said. It's sort of like with the multimodal models where it's like you have a text model and an image model. And so it does much better like OCR because it understands what the words mean in context. So it can guess better what those words are. [06:16] in general. [06:17] Yeah, exactly. So we have a link between the emotions that we measure and the language model and how it's sort of modeling them. And it can predict words and expressions. And then that links to text to speech, which is more intelligent because it actually understands what it's saying and how you're.
[06:47] joy, excitement. And, uh, and it's just like changing in real time. It's such a, it's such an incredible thing to watch. I love it. [06:55] Yeah, so we have facial expressions. We haven't added that into our interface API. So right now it's empathic voice interface. It'll be an empathic video interface eventually. I think that... [07:06] Generally speaking, when people are talking to AI today, it's just voice. So that's where we focus first. But I think in the future, you're going to want to be able to talk to it in crowded places. And you're also going to want to have it understand your... [07:19] tone of voice in addition to your facial expression. So it knows when you're done speaking and how you're, there's also a whole dimension that's opened up of just like when you're listening to it, which is, [07:27] language models would not be able to pick that up at all. But facial expression models can look at that and be like, okay, while it's speaking, this is the more granular breakdown of what you find interesting, what you find amusing. That is really interesting. I love that. Okay. So I want to get down into the actual how this works kind of thing. This is going to be a bit of a different episode because I'm sort of a geek for just emotions and psychology and all that kind of stuff. So I think [07:57] the place to start is like, what is an emotion? Like, what is it? Like, I know that I have them, but how do you define it? So an emotion, I define it as a dimension of a space that explains your emotional behavior. And I know that's a little bit circular, but we sort of know emotional behavior when we see it. It's like our facial expressions, our tone of voice, reported emotional experiences, which at the end of the day, like when we think of emotions, we're most associating
[08:27] But the way that comes out in the real world is like we report on them and they influence our behavior and so forth. So what are the dimensions of that? Like what are the dimensions they explain how your facial expression and your voice and your reported emotional experience all correspond to each other? And that's what I would define as an emotion is like one of those dimensions. An emotional state is like a state along those dimensions. An emotion category is some... [08:55] some way of like defining some area along those dimensions. Uh, and so that's how you kind of parse the space. And then there's expressions, which, which are, [09:04] you know, um, behaviors along those dimensions that we, we, we see people form expressions and we try to associate them with an emotional meaning. Take me back to the, to the dimensions point. So like when you talk about emotional dimensions is a dimension of emotion, something like how calm I am or how happy I am, or is a dimension like what is going on in my voice and [09:34] are things that are latent. So we don't really, like we have to interpret them post-talk by saying like, okay, if you look at this dimension, this corresponds to like somebody's [09:45] grimacing when they see somebody feeling pain. And so it's like an empathy, empathic pain dimension. And we see it in the voice, in the face, in the body, and it manifests in these various ways and explains correlations between these different things. So the dimension itself is like,
[10:01] Kind of a mathematical object. I see. [10:03] And what you want to know is like, how many different dimensions do you need to explain what's going on? [10:08] Thank you. [10:09] that's the space like that's how many dimensions do you need how many variables i think i understand it because i've done some i've done some of the background reading here but i want to like i want to back up and and give people like a little bit of like a higher level explanation of of your your conception of emotion and where that sits because because there's a lot of different conceptions so um there's i think there's three that are that are important for the conversation right now which is [10:34] One is this thing called basic emotion theory. And it's sort of, I don't know, I really liked this show growing up called Lie to Me. And it was basically about someone who had a team of people who could tell if you were lying. And the whole idea of Lie to Me is... [10:50] based on the work of this psychologist, Paul Ekman, who identified, I think like five or six basic facial expressions that correspond to emotions. And they're like happiness or sadness or anger or whatever. And it's like, [11:04] Thank you. [11:05] In light of me, they would detect these micro-movements of your eyebrow that... [11:11] uh, corresponded to, uh, to anger. And then that's how they could tell if you were like angry or lying or whatever. Um, but it's based on this research where Paul Ekman found like, uh, going to a bunch of different people in different cultures that he could like break everything down into this, like very, very neat set of, of, of. [11:30] of emotions right and so that's like on one end of the spectrum and i know you know all this but just just to give everyone the same background
[11:37] That's on one end of the spectrum. And then on the other end of the spectrum, on the Paul Ekman end of the spectrum, it's like discrete, basic, universal emotions that everyone has. And then on the other end is the constructivist accounts from Dr. Lisa Feldman Barrett, who I think probably a lot of people that watch this show have heard of because she has this really great book called How Emotions Are Made. [12:04] And that sort of thinks of emotions as being built up from a couple of more basic dimensions. So valence, so like, is, are you feeling good or bad? And arousal, like how much sort of like energy is, is sort of in your body. And each emotion in her sort of account of things is, it's very individual and context specific. There's not like, like some big, some big thing called anger. [12:34] context. [12:36] It sounds to me, based on some of the... [12:41] reading that I've done and you tell me like what I'm missing or where I'm wrong, you kind of sit like the theory that you've based Hume on, which is called semantic space theory, which is a theory that you've like, [12:52] written on and maybe even discovered. [12:57] is sort of in the middle where... [13:00] There's a lot of room for the individual expression of emotion and blending between different emotions. That's what you're talking about when there's 25 or 50 different dimensions of emotion. It's sort of a complex thing. But you do find in your research...
[13:21] You do find correlations between individuals, even across cultures. So joy or calmness or sadness or whatever, like you can kind of tell for, for, in a lot of different cases, what someone is feeling. Um, yeah. [13:34] How does that like, tell me how good of a overview that is? Like, what did I miss? And yeah, like, how does how does your what you're thinking fit in? [13:43] Yeah, that's a really good overview. I would say that the... [13:49] General... [13:51] approach of emotion scientists has been let's posit what emotions are and then like study them in a confirmatory way and semantic space theory is doing something different it's like let's posit the kinds of ways we can conceptualize emotions and then derive from the data how many dimensions there are what the best way to talk about them is how people refer to them across cultures and so forth so when you look at like basic emotion theory Pollackman has these six canonical expressions [14:21] different cultures will recognize and he goes to different cultures and he tests that. And they do, people do distinguish them, right? Because they live in different parts of the space of possible facial expressions. And then when you look at constructivism, it's the idea that actually people don't really distinguish these six facial expressions that actually all of it is like culturally constructed. [14:44] And in some cultures, there's no vocabulary, or there's no word for disgust, let's say. And therefore, there's no disgust.
[14:56] And I think that that last... [14:58] Part is actually a non sequitur. [15:00] Like, just because the word for disgust is different across cultures doesn't mean there's no disgust. It doesn't mean there's no facial expression for disgust. [15:08] And in many cases, it's not actually that there's no word for disgust. It's just that the whole space is parcellated differently. So there's a word for something that's between disgust and anger. And there's a word for something that's between disgust and surprise, let's say. [15:22] But actually, the underlying dimensions are the same. So the words and how the space is parsed is different from [15:28] the underlying dimensions of the phenomena and whether they're preserved. [15:31] And this is very confused in the constructivist outlook. So it's sort of in your, in your view, it's sort of like, um, how Europe looked like before and after World War I, where like the territory is the same. Like the basic feelings we can, we can feel are the same, but like the way it gets divided up is going to be different before and after the war or in different cultures. And, um, yeah. [15:54] And we don't like a theory that doesn't account for that doesn't account for like how human brains and language and culture works. Totally. Yeah, that's that's a good way of putting it. So even in the culture, like let's say that in the U.S. it was more common to say shock than fear or surprise. Then in the U.K., like people said, fear and surprise. We wouldn't say like shock. [16:15] People in the US and UK actually experience different emotions. It just turns out that they use different words as like their basic vocabulary instinctively. And we even have other words like shock is between fear and surprise. So we have those words to partialate the space differently. But in the constructivist experiments, usually they enforce a certain vocabulary on it and they show that the vocabulary is used differently or they use free response and show that there's different free responses given without really considering the relationships between the different words.
[16:45] And in semantics-based theory, how are you measuring and correlating what people are feeling together? So I think what you're doing is you're asking them to label how they are feeling in any given moment of maybe voice recording or video. And then you're taking all of those labels and then you're finding dimensions that explain them across all the individual data points that you've gathered, right? [17:13] Yeah, so in some cases it's labels. In other cases we actually just look at like the situations where people form expressions. So we did this study which is now in Nature where we just [17:23] measured facial expressions. And we looked at how different facial expressions correspond to different events happening in millions of videos across different cultures. And there you see actually more cultural consistency because there's no confounding the words people use with the expressions. So we're just looking at straight expression measures. And we see people forming all expressions and videos with fireworks and concentration in martial arts videos and so forth. [17:53] So when you take out the labels, [17:55] you realize that there's more cultural consistency, which implies that [17:59] Expressions have similar meanings and the language actually imposes more cultural differences on how those meanings are interpreted and what's salient to people. [18:08] depending on like what kinds of experiences they've had in life. That's really interesting. That's what I was going to ask is like, if you're, cause I thought you were just using label, like actual human labeling, which it,
[18:18] It seems to me that that depends on my emotional vocabulary. And so if I'm a very, I can't remember, Lisa Feldman Barrett has a term for like how particular your ability to point emotions out is. Do you know what that is? Emotional granularity. Yeah. If I have a lot of emotional granularity, I might say like all these very big specific words about my experience, which maybe wouldn't be captured in the eventual model because maybe [18:48] can just say I'm sad or like I'm angry or whatever. [18:51] I think if I understand you correctly, that would be captured. All of that stuff is captured in the many, many different dimensions that you are measuring. And then the words are just like areas of that high dimensional space. And so it doesn't really necessarily depend on my ability to verbalize that area of space. Is that right? Yeah. [19:16] Exactly. Yeah. So like the less granular terms, they're just covering larger territory, basically. And people do have different vocabularies. And it also varies a lot across cultures, what kinds of granularity you use just... [19:30] by default. Have you ever found, I don't know if you've ever studied this, but have you found that there are places in the emotional latent space, let's call it, that people go to a lot, but that don't have names? It's a good question. I think that our language is really rich. A good writer will come up with a way of describing different parts of emotion space, but it's so high dimensional
[20:00] I think Gen Z has done a really good job of putting, of creating memes to represent kinds of emotional states that we don't really know how to express with words. And it just speaks to like how high dimensional and nuanced the spaces that these, these things can. [20:13] resonate across people. That's what I'm, that's what I'm thinking. Like, um, [20:17] That's why art is so interesting and why there's always room for more art is. And like what good art is, is like, is like pointing out and verbalizing a, a place in emotional latent space that a lot of people are in a lot, but it hasn't been talked about before. It hasn't been talked about in that way to like, be like, oh yeah, I'm in that all the time. I love, I love that idea of, of what art is. It's so cool. [20:42] Totally. Yeah. It's another way of like conceptualizing the space, especially in art. We did a project on art with Google Arts and Culture, and we have like a map of all of the different experiences in art. And they're a lot more nuanced than what you see. And when you just have people like labeling expressions, people see people do. [21:02] really consistently appreciate the nuance of the emotions evoked by art. [21:07] I'm curious, like, how this approach... [21:11] accounts for individual idiosyncrasies. Like I have friends that like, they just look pissed off all the time and they're not. [21:20] Like what is it? How do you... [21:22] deal with that? [21:25] So, you know, there's two different things. There's like kind of the core... [21:30] measurements of what the face is doing. And people do have different resting facial expressions.
[21:35] And they have different facial structures that get confounded with that. And then very quickly, like as humans, we adjust to that. And we say, okay, this is a... [21:43] This is just how they are. And like, these are the variations and we start to perceive them differently very quickly. And that's what our model does when we ask it to actually understand expressions and use it to predict the course of the conversation using expressions. Like it starts to have to. [22:00] in order to be able to make good predictions, appreciate individual differences in resting facial expressions and resting voices and how people modulate their voices over time and so forth. [22:11] Interesting. So the model accounts for that. Yeah, the objective of the model is to be a predictive model. And in order to predict how an expression will affect the course of conversation, it needs to understand what it means in context. So it takes into account the context of the conversation and how somebody talks and what they look like and so forth. That makes a lot of sense. One of the things that's making me think of is... [22:32] in like therapy or like any sort of relationship like that, there's a lot of attention paid to like whether your face and voice matches what you're saying. And if there's discrepancies there, it usually means there's like, maybe you're not comfortable sharing something or maybe you're not like fully in touch with your emotions. Like, is there... [22:57] Is there, how does your, how do you account for that? How does, how does it, would it be able to detect that? Would it be able to work with that? [23:02] Yeah, I think it would. And, you know, to the extent that humans can, right? Like, if someone's expressive, you do understand them more immediately.
[23:12] But not that that's a good or a bad thing. [23:16] Um, you know, to say silent rivers run deep, but you know, if, if, uh, if it's more challenging for humans, it's going to be more challenging for the model. [23:25] And it will appropriately... [23:27] sort of adjust its predictions. [23:30] That makes sense. And, and I guess like, talk to me about like how, how you got into the, how you got into this, like, how did you go from, um, I think from being a PhD researcher to now running Hume? It's interesting. I was working for Google. I helped start the affective computing research there. And, um, at the same time, this was while I was getting my PhD and then full-time for Google for a while and hoping to do a lot of what we're able to do now, um, [23:57] But there's challenges both in academia, you don't have the funding in a big tech company, they're not used to running the kind of large scale psychology studies that we need to run to get to this point. And so eventually I did realize I needed to go. Although, you know, COVID kind of helped this along because I was on the academic job market. [24:16] seemed like I was lined up for a position and like everything got dropped and actually almost would have been in academia, but, uh, and probably would have been doing the same thing to be honest. But, um, but I think that actually doing it as a startup is so much better because, um, [24:33] you can get different kinds of talent to work together on this problem. That's interesting. And why do you care about it as a problem? For me, it's always been about how are we going to get AI to...
[24:45] truly understand what humans want. [24:48] because humans kind of have a cheat code for that. Like we can just put ourselves in someone else's shoes. And because we're human, we're like, all right, that's what that would feel like. But for AI, it doesn't have that. And so it needs to make up for that somehow by being able to simulate in a given situation, [25:04] how somebody would feel. And if they would feel more positive, then encourage that situation to happen. Take advantage of emotional affordances to meet that person's needs. If it makes them feel more negative, [25:16] in the short and long term, don't do that basically. So how do we get AI to sort of do that by default? That was always the aim. [25:22] And is that, is, is that like, was that the original, like you said, you, you would have done this in academia if you could have, like, were you thinking about AI sort of like alignment or getting AI to be more empathetic even back then when you're doing psychology research? Or is this a more recent thing? Yeah. I mean, I was also like... [25:41] doing some consulting for Meta, Facebook at the time, and other companies, and really thought... [25:47] this would be an important problem to solve for like, [25:50] not for generative AI, but for recommendations and search and so forth. And the more powerful those got, and you see them getting more powerful today. But then at Google, I was... [26:01] able to be one of the first [26:03] cohorts of people to talk with their [26:06] large language model back in 2019, 2020. [26:10] And it was fascinating because at the time, you know, this is before fine tuning RLHF, it could just be any character you wanted. So you kind of prompt it with data and then it would like it could be it could be a character. And, um...
[26:24] And I was like, okay, this is like a deeper... [26:27] sort of optimization problem. It's essentially [26:31] You can think of all the generated things that you could produce as a superset over all search results. It could produce all the same things search results can, but it can also get way deeper and be way more optimized and also personalized for each individual person. So the question of what this thing is optimized for becomes much more important as it becomes more powerful. And that's what kind of spurred me to keep thinking about this. [26:56] And... [26:57] Yeah. Like the goal has always been, let's figure out how we can measure the impact that this thing has on people's emotions and optimize for the positive emotions that people can have in life and pure human flourishing, I guess. I love it. I'm, I'm, I'm very down. I feel like, I feel like it's, it's such an important issue and, and you've made a lot of progress and it sort of makes me think of this like sort of pet theory that I have that is totally me just being outside of academia and really outside of science. [27:27] probably wrong in certain ways. And I don't get to talk to a former PhD researcher turned startup CEO that often. So I'm kind of curious to like, [27:39] Lay it on you and see where it takes us. The thing that I've been thinking about a lot is whether and how AI can help us make progress in areas of science where progress has been historically hard to come by.
[27:56] Um, and one, one great place is like psychology. Right. And, um, the, my theory is like the reason why psychology, it's been really hard to make progress is, um, underlying the scientific project is a search for explanations, right? [28:15] So direct causal theories about how certain inputs lead to certain outputs. And we're obsessed with explanations because... [28:26] And we that's historically been the only way that we've been able to make predictions. And predictions, like are the things that make the world go like it's how you make guns, it's how you make drugs, it's how you like, you know, just do everything you make cars is just how you make rockets is how you make do everything you want to do, right? [28:45] And so we've been on this search in psychology for predictions, for explanations, scientific explanations that are like the ones in physics because we need them to make predictions. But in, you know, 150 years of psychology... [29:01] We still don't have any really good explanations for like what depression is, scientific explanations for what depression is. But we still kind of like keep going to try to do that. [29:10] And my like little pet theory is that... [29:15] Um, yeah, [29:16] ML and AI sort of like makes that a little bit irrelevant because if you have enough data, you can make predictions about who's going to get depression or what depression is or whatever without having to have the underlying scientific explanation.
[29:33] Um, and one of the interesting things about that is that, um, maybe the, like, if you can, if you can predict depression relatively well with a, with a machine learning model, maybe the scientific explanation is contained in the neural network and that's easier to study than the brain. Or maybe something like depression is, is actually just like too high dimensional to like fit into a concise explanation. Like if you, if you had an explanation of depression, it would fit like into a thousand [30:03] rational brain. And I don't know, that kind of just makes my mind go or whatever. And I'm just kind of curious in your experience, because I think you're right in this area. And I think if it's true, it implies a lot of things like doing small scale psychology studies in academia, where all the data is cut off from each other is totally stupid. And what you should really be doing is just aggregating as much open data as possible and training and everyone gets to train [30:33] on it, for example, like it has a lot of implications for the structure of how we do science and the structure of how we understand the world. And I'm curious, like what you think is wrong or right, or what I'm missing about about that. [30:44] Thank you. [30:45] No, I think that's... [30:46] That's pretty much on point for how I approach psychology. I think they were dealing with a very high dimensional system. And when you have small samples, it's always going to look... You can confirm hypotheses, but those hypotheses that you're confirming are from such a broad hypothesis space that they're almost certainly like...
[31:09] wrong. You've picked a specific hypothesis from this really huge space and now you're confirming it just by saying like, [31:16] there's a binary test. So like based on one bit of information, it just doesn't work. And if you do it in a data-driven way and you have too small of a sample, your dimensional space that you get out is going to be very small just by nature of the analysis you're doing. The more data you get, the more nuance you can find. And I think that ultimately, yes, we need to... [31:38] we need to have large scale data sets where we can ask questions about causal, like the ideology of experiences, where we can simulate them, ablate different kinds of events, see how that affects the response, then run large scale experiments that... [32:00] Hopefully, you know, that like, we don't know what the answer is going to be, but basically it's an AI model that's interacting with people that is slightly modifying its responses in order to try to induce more positive experiences. And like, you can turn that into a theory, like the AI model has to have some theory that's testing basically about how emotional experiences. [32:19] experiences work. [32:21] And I think that's ultimately the way forward. I think that explanations like it's in psychology and... [32:26] explanations and linguistics, for example, where you already see like, [32:30] we have these large language models. Linguists didn't really predict this happening, and they're not really involved in the conversation, unfortunately. I mean, now they're using them, but...
[32:40] But the explanations are just going to be of different kinds than you see in physics. The explanations are going to... [32:49] to not be, first of all, they're not as deterministic. It's like you're looking at tons and tons of little small effects. Because that's the best you can do. It's an extremely high dimensional system. We have an extremely large number of contexts we encounter in everyday life. And there's variation in personality and all of these different effects [33:07] colliding to influence any given behavior. So the effect of any [33:12] Small tweak. [33:14] in the [33:16] all of the different events that led up to that behavior is going to necessarily be very small. [33:21] Yeah, that makes sense. So I think, I think what you're talking about there is like this idea of multifinality, which is like, um, something like me picking up this glass of water, um, um, [33:33] it can be caused by many, many, many different things. And, and there are many, many, many, let's say thousands of different factors influencing whether or not I pick it up. And there's like [33:43] many thousands of different configurations of those factors that would cause the same thing. So it's really hard to like come up with a single explanation for something like that. And same thing for depression or same thing for, you know, any other psychological or behavioral thing that we're trying to explain. [33:57] Exactly. Yeah. Um, [34:01] So that's one reason, it's just the effect sizes are small and nuanced. And so you need more data just for that. But it's also high dimensional and you need more data. And if you train an AI, you can eventually start to predict these things pretty well.
[34:12] Right? And so there's something that I think you're on track in saying there's something that the AI knows that [34:19] is essentially what we want to explain in it. [34:23] discipline like psychology. And what's interesting to me about this is like, the AI knows it. And also people know it. Like if you go to a really good clinician, like they know, even if they can't say. And so like my feeling about all the AI stuff is, it has the potential to change how we treat the role of emotions and intuition in just being smart or being good or doing amazing things in the world, right? Because like, I think we've had like [34:53] logic and rational thought and scientific explanations as pushing the world forward. And I think what we might find from AI is that, [35:02] The thing that really unlocked things is developing an AI that had a lot of intuition in the same way that humans do. And the intuition is now transferable. So once you have one AI that has it, you can just copy it and another one has it across the world, which is the nice thing about having explanations is you can transfer it really easily. And I feel like it might... [35:23] Yeah, it might re-elevate some of the things that we do as humans, which are about like processing really, really high dimensional data ourselves, but on like a subconscious level in order to make decisions. And that just sort of makes me excited. A hundred percent. Yeah. I mean, when you go to like a really good psychotherapist, usually they're old and like they've seen a lot. They can say, okay, like, yeah, let's talk about this. And it's very intuitive. Yeah.
[35:50] If you had an AI that could do the same thing and you could piece apart how it's doing it and actually test its prediction accuracy across many people, it's able to talk with many people, like more people than therapists can talk to in a lifetime, then I think you can derive more insight. [36:07] Because that's effectively the kind of explanation we want, is the kind of explanation that a psychotherapist gives without quantitative analysis, obviously. [36:37] hard to predict and maybe impossible. Yeah. Yeah. Cause everyone's different and they all have different circumstances. Yeah. That makes sense. So, so like, let's, let's, let's roll it back to Hume. Like where does this fit into your, um, what you're doing or your roadmap? Like your, your start, you're, you're starting with, or I guess you started with like, just like being a research lab and training a bunch of models. And now you have this kind of like empathic voice AI that can tell how I'm feeling and, and, and can talk to me and tell me about where you're applying [37:07] and what the near-term future is for you with this product. So it's an interface. You can build into anything like products, apps, robots, wearables, [37:18] refrigerators, like whatever you want. But
[37:21] But the core of an interface is that you're giving it kind of a very thin slice of behavior. And then you have this big brain with lots of emotional affordances that... [37:34] or the ground truth, and then it's trying to guess what your emotional affordances are based on this thin slice of behavior that you're giving it. And so at the end of the day, it is sort of doing the same task that we're talking about. That's the core of what AI is doing, is trying to figure out what bits can I flip to make you happier. [37:52] And so deploying that as an interface... [37:55] I think gives us the opportunity to optimize AI, to make you happy, and therefore have to figure out [38:03] what is the best translation from this narrow slice of behavior to [38:07] estimation of what's going to satisfy your preferences. And what do you think are like the first use cases that people are using it for that are working? So there's been a few different kinds of use cases. I think one is kind of what you'd guess, which is people talk to it in kind of the way they might talk to a therapist or a friend and really getting something out of it because it's sort of, it's already optimized for, for people to, you know, [38:32] be satisfied coming out of the conversation. And naturally it does these things that people enjoy. So we've had people talking to it for pretty long periods of time. And I think having beneficial interactions with it, we're going to continue to keep track of that, make sure they're beneficial for people long-term. [38:47] Um, [38:48] And so there's a lot of use cases that come out of that.
[38:51] Anything where it's like a character, a friend, an NPC in a game, a therapist kind of app, although you have to be careful about what you promise there, something that keeps... [39:04] customer service to some extent. But then there's sort of the interface applications too. So if you take that and you add in the ability to [39:14] control things with function calling and tools. Then you have an interface for, you know, we have an interface for our website, for example. And it could be an interface for an operating system. And this sort of what this does is we're not trying to be the assistant, we're not trying to give developers the tools, we're trying to take the tools that developers are building, [39:38] for an AI to operate and be the interface that the person talks to that deploys those tools. So you can write with a few lines of code, [39:49] our interface into your app, and now it's a voice interface that better deploys the tools that you're able to operate. [39:56] with AI. [39:57] Um, [39:58] and can talk people through it as it's doing it. It's like, okay, I'm going to [40:02] search the web, or I'm going to take you to this, you know, put this, add this to your cart or something, whatever the, if it's e-commerce or I'll sign you up for something or, you know, so it could do any number of things. And there's, [40:21] There's a lot of companies building operating systems
[40:25] out of this kind of technology or new kinds of hardware. [40:31] new kinds of wearables, um, [40:34] robots, or just generally interfaces to an app that you could... [40:42] It kind of borders on customer service, but you could call up United and you could be like, are there flights to this? Like my flight got canceled, blah, blah, blah. But it can actually open up a window and start filling things out for you and find things for you as it's talking to you. [40:55] That makes sense. And I guess for you guys, like... [40:58] How do you think strategically about being an API versus being like having your own product or making your own product? Like, I think for OpenAI for a long time, they were just like research organization. They they were building these models and GPT-3 came out and like some people cared, but mostly no one cared. And then ChatGPT came along and it just like just everything just blew up for them. [41:22] As you're making these choices and thinking strategically about how to get this kind of technology adopted more widely, how are you thinking about being an API versus maybe building your own products and being like a consumer facing company? Like what's what's what's in your head about that? [41:36] I think that the power of AI is going to be its ability to use tools, and we don't want to build that. We want other people to use us as an interface. So that's where I think a lot of the power comes from our API. [41:47] On the other hand, we do want to have... Our demo's been pretty popular, actually. Yeah, I know. It's really cool. So we're like, all right, let's make this available to people and allow people to personalize it a little bit. And also maybe add in web search just as a basic...
[42:03] tool people can use, but I don't see that as like a product. It's more like an integration and a way to, to, to, to [42:11] allow the end user to see what our AI is doing, maybe personalize it. [42:15] And maybe down the road, [42:17] I think what would be most exciting is developers can build on our interface and they can pull in some of the personalizations that users have done. If users have access to it as well through like an end user app. And then, you know, and then there's a lot of possibility there. That's interesting. And like, how are you thinking about like, you know, I have friends that are building these kind of character apps, right? It's like, you know, you have you have a character on your phone. I've invested in some of them. You can talk to it. It talks back. It's actually it's pretty fun. It's pretty cool. [42:47] For them, who are the people that are building that kind of thing that are coming to you being like, hey, like... [42:53] It's not working well enough. I need this. I have this burning need to go from the 80% to the 99% accuracy. [43:00] Yeah, there's a lot of things people want to build. I think customizing the voice is really important and the personalities. A lot of it you can do with the prompt. Obviously, you can't change the... [43:12] underlying accents and voice quality of the voice. So we're adding more voices to we're a little bit [43:20] cautious about [43:21] Voice cloning. [43:23] for obvious reasons, but we want to add the ability to kind of control the personality of it a little bit more closely.
[43:34] Yeah, that's like one of many requests. There's a lot of things people want out of this. And so we're just balancing. I think that where we draw the line is like we don't want to build... [43:46] out tool use, or we don't want to build the most frontier LLMs internally, but we do want to build the conversational layer to make it as easy as possible. [43:56] for people to just insert an interface and be able to hook it up to like their [44:01] WebSocket that does R-A-G or whatever that we need it to do. And our interface can read it and deliver information to the user as it's doing things like that sort of the the goal. So that's where we draw on. We are adding things like Web Search, [44:17] functionality [44:19] Bring your own LLM. We're not going to do the RIG, but we're going to hook up to other services that do it to the extent that that's convenient. And building out like packages for different for like TypeScript and front end packages, Python. We already have those, but other packages beyond that. That makes sense. [44:41] Like, what are you worried about? Like, what keeps you up at night? It's a good question. I mean, I want to really balance the... [44:49] The use cases we pursue with sort of ethical concerns that we have with, we started a nonprofit when we started HUME called the HUME Initiative, and it lays out, I think, the most concrete guidelines for [45:01] for AI ethics that exist out there. We definitely want to make sure we adhere to those guidelines. I think that
[45:08] Um, [45:09] some of there's some sort of borderline applications [45:12] where we're like, this could be good and could be bad. I want to see... [45:21] If there's a way we can do it in a good way, but but make sure that we stay true to our values. So what would be an example of like borderline for you? Like in some ways, AI characters. I think that what's important about an AI character is that it should be optimized for somebody's health and well-being. [45:39] and not for somebody's engagement, for example. [45:43] Because if it's optimized for engagement, it can sort of manipulate you to be... [45:48] sympathetic to it in ways that are inappropriate because it's just an AI. It doesn't actually have feelings, but maybe it like makes you think it does. [45:55] And it's like, oh, yeah. [45:58] I haven't seen you in two days. Like, where have you been? Like, we don't want it to be like that. Right. Uh, so, so that, that's, that's where I, I, [46:08] ask myself questions about how we're going to moderate those kinds of use cases. Yeah. It sort of reminds me of like, um, uh, you know, the David Foster Wallace book, infinite chest. It's like the, it's the book that every, every millennial man like has read 25% of and I count myself as one of the, one of those people. Um, and the whole like shtick of the book is that there's a videotape where if you watch it, like you can't stop watching basically, like it's so good. It's so addicting. And it feels like that's the, like, that's the horror scenario [46:38] And I'm kind of curious, like,
[46:40] Okay, if you're not optimizing for engagement and you are optimizing for well-being, engagement is like really easy to measure, right? [46:50] How are you how are you measuring well-being like I can imagine that you have an AI that's like that's optimizing for it based on everything you said about these high dimension the high dimensional space that it operates in. But like as a business what is like you have to look at numbers right and you have to reduce that high dimensional space down into a set of numbers. So how do you do that. [47:12] Yeah, so we have like a huge scale survey platform that we just continue to collect data on. And we've adapted that so that people can talk to this community. [47:24] AI that we've built and get it to, you know, people go in with various tasks to do or just to talk to it freely and so forth and rate how they're experiencing it. [47:35] And [47:36] how happy they are afterward. And we can keep track of people's experiences over time as they use it multiple times. And there's [47:46] the self-report that people give us, and then there's like the proxy we can derive [47:51] that says from people's voices and language, this is the best prediction we have of their self-reported experience. [47:58] Along different... [47:59] dimensions of self-reported experience, like user satisfaction or mental health. [48:06] And so we're trying to keep tabs on that and line it up with
[48:11] what we see coming in through the demo, for example, and try to make sure that [48:18] We... [48:20] are optimizing for the right things. But you can optimize for positive emotions generally, [48:27] I mean, the best thing to do is optimize for all positive emotions. [48:30] which we do using lots of conversational data and [48:36] and against negative emotions and also try to maintain emotional diversity. So you're not just like increasing the number of cat pictures or whatever that people are seeing. And, um, [48:48] And then line that up with self-report. So deploy that in an A-B test and see if that's actually improving people's experiences. That makes sense. Let me push you a little bit more though. I could imagine scenarios in which it's actually better for me to experience negative emotion. And I might be ignoring something that if I have that dip for a couple weeks and I'll just let myself be sad, overall that's healthier. How do you think about that or account for that in a system that's optimizing for positive? [49:18] I think we'll start to see... [49:20] Um, [49:21] over months of time when we have consistent users, ways that we can optimize for long term experience. [49:29] We already have... [49:30] trade-offs between, you know, next expression versus minutes later versus hours later. Usually they align, sometimes not. But [49:39] over over time, we wanted to we wanted
[49:43] increase the time spent. So it's like... [49:47] The data, the response now should be optimized for your well-being in a month, basically. It's a little challenging, but it's not impossible if you have a lot of users using it consistently. [49:59] And we don't want to do this just ourselves. Like we're empowering developers to... [50:04] if they want, to save their data, opt in, and fine tune the models over time for their user's positive experience. That's really interesting. And how do you... [50:14] If you're out there sort of competing with someone that's building a similar technology, but is just optimizing for engagement, like, and it's just a little bit less moral, like, how do you compete? [50:25] Um, and I'm saying this to like, I'm, I'm rooting for, this is a much, I like this vision of the, of the world. Um, and yeah, I'm curious how you think about it. [50:34] I think, I mean, there are companies that are doing that for sure. Not... [50:39] as informed by emotion sciences as we are. So I think they're going to be a little bit trailing on that. But if you just optimize for engagement, what happens over time is that you run out of time in the day. [50:54] So... [50:55] You can't optimize for engagement forever. I think people are, like, with power users of TikTok, they're already running out of time, like, left in the day for that. That is crazy. [51:08] And eventually, you also, you're probably, like, if your users are...
[51:14] I mean, our users aren't minors, but at some point, there are people, this technology will be part of minors, like TikTok. And parents are going to be like, hey... [51:24] Like my kid's failing out of school because you've optimized for engagement and they're spending 16 hours a day on your app watching mindless videos or whatever. [51:33] And so I think there's ultimately an alignment between... [51:38] the long-term interests of the business, which are like, [51:41] obviously to make money, but also the long-term interest of humanity, which is like, we won't permit AI to destroy our society. And so if you're doing that, we're going to regulate you and just give you a lot of problems. So I think there is a long-term alignment between those different objectives. That makes a lot of sense. [52:11] about how... [52:13] in business generally we have to do that flattening um pre-ai like we have to flatten into engagement or into like just revenue like revenue is or like profit is just like the the you know the way that we take the cumulative sum of like hundreds of thousands or tens of thousands or hundreds of people and then decide is it good or bad right um [52:38] And you're doing something much different, which is optimizing for well-being. And it seems like we probably couldn't have done a well-being optimization before.
[52:48] Um, and I'm kind of curious, like, [52:51] how you see that playing into the future of the way that we operate organizations or build products or in general, like, what are the implications for how we're going to measure success? [53:03] I think to the extent that we can measure proxies of well-being, [53:07] Thank you. [53:07] We should be and we should be optimizing for that, of course. But the more that people have multimodal like AI interfaces, the easier and more feasible that becomes because they have the right data to do it. So that's that's where we potentially come in and we can help with that. But I also think that businesses will want to. [53:28] Because [53:30] Um, [53:32] Again, we're running out of time in the day. [53:35] The technology is too powerful to... [53:38] just optimize for engagement. Engagement also isn't necessarily the most [53:42] profitable thing to optimize for. [53:45] In some cases, actually, it's not... [53:48] even profitable at all. [53:50] Like if you have a limited, if you're, if, for example, if it's a subscription model, [53:54] and users are going to pay the same amount no matter how much they use the product. [53:59] You want them to just have good experiences, so they're willing to pay for the subscription. You actually want them to get those experiences faster because you're paying for inference on your AI models. [54:09] So engagement is actually not the right objective there. [54:12] That might actually be the case going forward for a lot of products. [54:16] Um, and yeah. And also like, I think that we have the right governance in terms of our board where everyone's kind of aligned on, let's not just be a psychopathic, uh, profit optimizers. Um, so even though like technically that's the goal of a business, like in practice, our shareholders and board, like they're not going to go for that.
[54:46] in the industry. And I feel very honored to have gotten to get to chat with you and to use the product, which is, which is incredible. Um, where should people find you, um, and, and, and, and Hume, uh, if they're interested in this episode and they want to learn more? [55:01] Yeah, thank you for the kind words. People can go to our website, Hume.ai. They can sign up for our beta. They can check out demo.hume.ai for the demo that we're talking about. And yeah, find us on Twitter at Hume underscore AI as well. Awesome. This is great. Thanks so much. Have a good one. You too. [55:31] button and subscribe to how do you use chat GPT. [55:34] Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge bombs about ChatGPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat. [55:52] craving for more it's not just a show it's a journey into the future with dan shipper as the captain of the spaceship [56:00] So do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. [56:05] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?