Book Read Free

Architects of Intelligence

Page 23

by Martin Ford


  MARTIN FORD: I want to begin by exploring your background; I’m especially interested in how you became involved with AI and how your trajectory went from an academic background to where you are today with your company, Affectiva.

  RANA EL KALIOUBY: I grew up around the Middle East, being born in Cairo, Egypt and spending much of my childhood in Kuwait. During this time, I found myself experimenting with early computers, as a result of both my parents being in technology, and my dad would bring home the old Atari machines where we would pick them apart. Fast-forward and that grew into my undergraduate course where I majored in Computer Science at the American University in Cairo. I guess you could say this is where the thinking behind Affectiva first came into play. During this time, I became fascinated by how technology changes how humans connect with one another. Nowadays a lot of our communication is mediated via technology, and so the special way that we connect with technology, but also with one another, fascinates me.

  The next step was to do a PhD. I received a scholarship to work with the Computer Science department at Cambridge University, which, on a side note, was something that was quite unusual for a young Egyptian and Muslim woman to do. This was in the year 2000, so it was before we all had smartphones, but at the time I was quite interested in this idea of human-computer interaction and how our interface is going to evolve over the next few years.

  Through my own experience, I realized that I was spending a lot of time in front of my machine, where I was coding and writing all these research papers, which opened me to two realizations. The first realization was that the laptop I was using (remember no smartphones yet) was supposedly quite intimate with me. I mean, I was spending a lot of hours with it, and while it knew a lot of things about me—like if I was writing a Word document or coding—it had no idea how I was feeling. It knew my location, it knew my identity, but it was just completely oblivious to my emotional and cognitive state.

  In that sense my laptop reminded me of Microsoft Clippy, where you would be writing a paper, and then this paper-clip would show up, do a little twirl, and it would say, “Oh, it looks like you’re writing a letter! Do you need any help?” Clippy would often show up at the weirdest times, for example when I was super-stressed and my deadline was in 15 minutes... and the paperclip would do its funny little cheesy thing. Clippy helped me realize that we have an opportunity here, because there’s an emotional intelligence gap with our technology.

  The other thing that was kind of very clear is that this machine mediated a lot of my communication with my family back home. During my PhD, there were times when I was that homesick, and I would be chatting with my family in tears, and yet they’d have no idea because I was hiding behind my screen. It made me feel very lonely and I realized how all of the rich non-verbal communications that we have when we’re face to face, in a phone conversation or a video conference, are all lost in cyberspace when we are interacting digitally.

  MARTIN FORD: So, your own life experiences led you to become interested in the idea of technology that could understand human emotions. Did your PhD focus much on exploring this idea?

  RANA EL KALIOUBY: Yes, I became intrigued by the idea that we’re building a lot of smartness into our technologies but not a lot of emotional intelligence, and this was an idea that I started to explore during my PhD. It all began during one of my very early presentations at Cambridge, where I was talking to an audience about how curious I was about how we might build computers that could read emotions. I explained during the presentation how I am, myself, a very expressive person—that I’m very attuned to people’s facial expressions, and how intriguing I found it to think about how we could get a computer to do the same. A fellow PhD student popped up and said, “Have you looked into autism because people on the autism spectrum also find it very challenging to read facial expressions and non-verbal behaviors?” As a result of that question, I ended up collaborating very closely with the Cambridge Autism Research Center during my PhD. They had an amazing dataset that they’d compiled to help kids on the autism spectrum to learn about different facial expressions.

  Machine learning needs a lot of data, and so I borrowed their dataset to train the algorithms I was creating, on how to read different emotions, something that showed some really promising results. This data opened up an opportunity to focus not just on the happy/sad emotions, but also on the many nuanced emotions that we see in everyday life, such as confusion, interest, anxiety or boredom.

  I could soon see that we had this tool that we could package up and provide as a training tool for individuals on the autism spectrum. This is where I realized that my work wasn’t just about improving human-computer machine interfaces, but also about improving human communication and human connection.

  When I completed my PhD at Cambridge, I met with the MIT professor, Rosalind Picard, who authored the book Affective Computing, and would later co-found Affectiva with me. But back in 1998, Rosalind posited that technology needs to be able to identify human emotions and respond to those emotions.

  Long story short, we ended up chatting, and Rosalind invited me to join her lab at the MIT Media Lab. The project that brought me over to the US was a National Science Foundation project that would take my technology of reading emotions and, by integrating it with a camera, we could apply it for kids on the autism spectrum.

  MARTIN FORD: In one of the articles I read about you, I think you described an “emotional hearing aid” for autistic kids. Is this what you are referring to? Did that invention stay at the conceptual level or did it become a practical product?

  RANA EL KALIOUBY: I joined MIT in 2006, and between then and 2009 we partnered with a school in Providence, Rhode Island, and they were focused on kids on the autism spectrum. We deployed our technology there, and we would take prototypes to the kids and have them try it, and they would say “this doesn’t feel quite right,” so we iterated the system until it began to succeed. Eventually, we were able to demonstrate that the kids who were using the technology were having a lot more eye contact, and they were doing a lot more than just looking at people’s faces.

  Imagine how these kids, somewhere on the spectrum of autism, would wear these pairs of glasses with a camera facing outwards. When we first started doing this research, a lot of the camera data we got was just of the floor or the ceiling: the kids weren’t even looking at the face. But the input that we got, from working with these kids, allowed us to build real-time feedback that helped encourage them to make face contact. Once those kids started to do that, we gave them feedback on what kind of emotions people are displaying. It all looked very promising.

  You’ve got to remember that Media Lab is a unique academic department at MIT, in the sense that it has very strong ties to industry, to the point where about 80% of the lab’s funding comes from Fortune 500 companies. So twice a year, we would host these companies for what we called Sponsor Week, where it was very demo-or-die because you had to actually show what you were working on. A PowerPoint wouldn’t cut it!

  So, twice a year between 2006 and 2008 we’d invite all these folks over to MIT, and we would demo the autism prototype. During these kinds of events, companies like Pepsi would ask if we’d thought about applying this work to test whether advertising was effective. And Procter & Gamble wanted to use it to test its latest shower gels, because it wanted to know if people liked the smells or not. Toyota wanted to use it for driver state monitoring, and The Bank of America wanted to optimize the banking experience. We explored getting some more research assistants to help develop the ideas that our funders wanted, but we soon realized that this was not research anymore, that it was in fact a commercial opportunity.

  I was apprehensive about leaving academia, but I was starting to get a little frustrated that in academia you do all these prototypes, but they never get deployed at scale. With a company, I felt we had an opportunity to scale and bring products to market, and to change how people communicate and do things on a day-to-day basis.

/>   MARTIN FORD: It sounds like Affectiva has been very customer-driven. Many startups try to create a product in anticipation of a market being there; but in your case, the customers told you exactly what they wanted, and you responded directly to that.

  RANA EL KALIOUBY: You’re absolutely right, and it quickly became apparent that we were sitting on a potentially huge commercial opportunity. Collectively, Rosalind and I felt that between us we had started this field, we were thought leaders, and that we wanted to do it in a very ethical way as well—which was core to us.

  MARTIN FORD: What are you working on at Affectiva now, and what’s your overall vision for where it’s going to go in the future?

  RANA EL KALIOUBY: Our overall vision is that we’re on a mission to humanize technology. We’re starting to see technology permeate every aspect of our life. We’re also starting to see how interfaces are becoming conversational, and that our devices are becoming more perceptual—and a lot more potentially relational. We’re forming these tight relationships with our cars, our phones, and our smart-enabled devices like Amazon’s Alexa or Apple’s Siri.

  If you think about a lot of people who are building these devices, right now, they’re focused on the cognitive intelligence aspect of these devices, and they’re not paying much attention to the emotional intelligence. But if you look at humans, it’s not just your IQ that matters in how successful you are in your professional and personal life; it’s often really about your emotional and social intelligence. Are you able to understand the mental states of people around you? Are you able to adapt your behavior to take that into consideration and then motivate them to change their behavior, or persuade them to take action?

  All of these situations, where we are asking people to take action, we all need to be emotionally intelligent to get to that point. I think that this is equally true for technology that is going to be interfacing with you on a day-to-day basis and potentially asking you to do things.

  Whether that is helping you sleep better, eat better, exercise more, work more productively, or be more social, whatever that technology is, it needs to consider your mental state when it tries to persuade you to take part in them.

  My thesis is that this kind of interface between humans and machines is going to become ubiquitous, that it will just be ingrained in the future human-machine interfaces, whether it’s our car, our phone or smart devices at our home or in the office. We will just be coexisting and collaborating with these new devices, and new kinds of interfaces.

  MARTIN FORD: Could you sketch out some of the specific things you’re working on? I know you’re doing something with monitoring drivers in cars to make sure they are attentive.

  RANA EL KALIOUBY: Yes, the issue today around monitoring drivers in cars is that there are so many situations to cater for, that Affectiva as a company has focused specifically on situations that are ethical, and where there’s a good product-market fit. And of course, for where the markets are ready.

  When Affectiva started in 2009, the first kind of low-hanging market opportunities were in advertising testing, as I mentioned, and today Affectiva works with a quarter of the Fortune Global 500 companies to help them understand the emotional connection their advertising creates with their consumers.

  Often, companies will spend millions of dollars to create an advertisement that’s funny or one that tugs at your heart. But they have no idea if they struck the right emotional chord with you. The only way that they could find that sort of thing out, before our technology existed, was to ask people. So, if you, Martin Ford, were the person watching the ad, then you’d get a survey, and it would say, “Hey, did you like this ad? Did you think it was funny? Are you going to buy the product?” And the problem with that is that it’s very unreliable and very biased data.

  So now, with our technology, as you’re watching the ad, with your consent it will analyze on a moment-by-moment basis all your facial expressions and aggregate that over the thousands of people who watched that same ad. The result is an unbiased, objective set of data around how people respond emotionally to the advertising. We can then correlate that data with things like customer purchase intent, or even actual sales data and virality.

  Today we have all these KPIs that can be tracked, and we’re able to tie the emotional response to actual consumer behavior. That’s a product of ours that’s in 87 countries, from the US and China to India, but also smaller countries like Iraq and Vietnam. It’s a pretty robust product at this point, and it’s been amazing because it allows us to collect data from all over the world, and it’s all very spontaneous data. It’s data that, I would argue, even Facebook and Google don’t have because it’s not just your profile picture, it’s you sitting in your bedroom one night, watching a shampoo ad. That’s the data we have, and that’s what drives our algorithm.

  MARTIN FORD: What are you analyzing? Is it mostly based on facial expressions or also on other things like voice?

  RANA EL KALIOUBY: Well, when we first started, we worked with just the face, but about eighteen months ago we went back to the drawing board and asked: how do we as humans monitor the responses of other humans?

  People are pretty good at monitoring the mental states of the people around them, and we know that about 55% of the signals we use are in facial expression and your gestures, while about 38% of the signal we respond to is from tone of voice. So how fast someone is speaking, the pitch, and how much energy is in the voice. Only 7% of the signal is in the text and the actual choice of words that someone uses!

  Now when you think of the entire industry of sentiment analysis, the multi-billion-dollar industry of people listening to tweets and analyzing text messages and all that, it only accounts for 7% of how humans communicate. What I like to think about what we’re doing here, is trying to capture the other 93% of non-verbal communication.

  So, back to your questions: about eighteen months ago I started a speech team that looks at these prosodic paralinguistic features. They would look at the tone of voice and the occurrence of speech events, such as how many times you say “um” or how many times you laughed. All of these speech events are independent of the actual words that we’re saying. Affectiva technology now combines these things and takes what we call a multimodal approach, where different modalities are combined, to truly understand a person’s cognitive, social or emotional state.

  MARTIN FORD: Are the emotional indicators you look for consistent across languages and cultures, or are there significant differences between populations?

  RANA EL KALIOUBY: If you take facial expressions or even the tone of a person’s voice, the underlying expressions are universal. A smile is a smile everywhere in the world. However, we are seeing this additional layer of cultural display norms, or rules, that depict when people portray their emotions, or how often, or how intensely they show their emotion. We see examples of people amplifying their emotions, dampening their emotions, or even masking their emotions altogether. We particularly see signs of masking in Asian markets, where Asian populations are less likely to show negative emotions, for instance. So, in Asia we see an increased incidence of what we call a social smile, or a politeness smile. Those are not expressions of joy, but are more themed around saying, “I acknowledge you,” and in that sense they are a very social signal.

  By and large, everything is universal. There are cultural nuances, of course, and because we have all this data, we’ve been able to build region-specific and sometimes even country-specific norms. We have so much data in China, for instance, that China is its own norm. Instead of comparing a Chinese individual’s response to say, a chocolate ad, we compare a Chinese individual to the subpopulation that’s most like them. And this particular approach has been critical to our success in monitoring emotional states in different cultures around the world.

  MARTIN FORD: I guess then that other applications you’re working on are oriented toward safety, for example monitoring drivers or the operators of dangerous equipment to make sure they
stay attentive?

  RANA EL KALIOUBY: Absolutely. In fact in the last year we’ve started to get a ton of inbound interest from the automotive industry. It’s really exciting because it’s a major market opportunity for Affectiva and we’re solving two interesting problems for the car industry.

  In the cars of today, where there is an active driver, safety is a huge issue. And safety will continue to be an issue, even when we have semi-autonomous vehicles like Tesla that can drive themselves for a while but do still need a co-pilot to be paying attention.

  Using Affectiva software, we’re able to monitor the driver or the co-pilot for things like drowsiness, distraction, fatigue and even intoxication. In the case of intoxication, we would alert the driver or also even potentially have the car intervene. Intervention could be anything from changing the music to blasting a little bit of cold air, or tightening the seat belt, all the way to potentially saying, “You know what? I’m the car, and I feel I could be a safer driver than you are right now. I’m taking control over.” There’s a lot of actions the car can take once it understands the level of attention and how impaired a driver is. So, that’s one class of use cases.

  The other problem we’re solving for the automotive industry is around the occupant experience. Let’s look into the future where we have fully autonomous vehicles and robot-taxis, where there’s no driver in the car at all. In those situations, the car needs to understand the state of the occupants such as, how many people are in the car, what’s their relationship, are they in a conversation, or even do we have a baby in the car that’s potentially getting left behind? Once you understand the mood of the occupants in the car, you can personalize the experience.

  The robot-taxi could make product recommendations or route recommendations. This would also introduce new business models for auto companies, especially premium brands like a BMW or a Porsche, because right now they’re all about the driving experience. But in the future, it’s not going to be about driving anymore: it’s going to be about transforming and redefining that transport, that mobility experience. Modern transport is a very exciting market, and we’re spending a lot of our mindshare building products for that industry, and also for those partnered with Tier 1 companies.

 

‹ Prev