Architects of Intelligence
Page 30
That period at Oxford in the Robotics Research Group is what really sparked my interest in AI. I found machine perception particularly fascinating: the challenges of how to build learning algorithms for distributed and multi-agent systems, how to use machine learning algorithms to make sense of environments, and how to develop algorithms that could autonomously build models of those environments, in particular, environments where you had no prior knowledge of them and had to learn as you go—like the surface of Mars.
A lot of what I was working on had applications not just in machine vision, but in distributed networks and sensing and sensor fusion. We were building these neural network-based algorithms that were using a combination of Bayesian networks of the kind Judea Pearl had pioneered, Kalman filters and other estimation and prediction algorithms to essentially build machine learning systems. The idea was that these systems could learn from the environment, learn from input data from a wide range of sources of varying quality, and make predictions. They could build maps and gather knowledge of the environments that they were in; and then they might be able to make predictions and decisions, much like intelligent systems would.
So, eventually I met Rodney Brooks, who I’m still friends with today, during my visiting faculty fellowship at MIT, where I was working with the Robotics Group at MIT and the Sea Grant project that was building underwater robots. During this time, I also got to know people like Stuart Russell, who’s a professor in robotics and AI at Berkeley, because he had spent time at Oxford in my research group. In fact, many of my colleagues from those days have continued to do pioneering work, people like John Leonard, now a Robotics Professor at MIT and Andrew Zisserman, at DeepMind. Despite the fact that I’ve wandered off into other areas in business and economics, I’ve stayed close to the work going on in AI and machine learning and try to keep up as best as I can.
MARTIN FORD: So, you started out with a very technical orientation, given that you were teaching at Oxford?
JAMES MANYIKA: Yes, I was on the faculty and a fellow at Balliol College, Oxford, and I was teaching students courses in mathematics and computer science and as well as on some of the research we were doing in robotics.
MARTIN FORD: It sounds like a pretty unusual jump from there to business and management consulting at McKinsey.
JAMES MANYIKA: That was actually as much by accident as anything else. I’d recently become engaged, and I had also received an offer from McKinsey to join them in Silicon Valley; and I thought it would be a brief, interesting detour, to go to McKinsey.
At the time, like many of my friends and colleagues, such as Bobby Rao, who had also been in the Robotics Research Lab with me, I was interested in building systems that could compete in the DARPA driverless car challenge. This was because a lot of our algorithms were applicable to autonomous vehicles and driverless cars and back then, the DARPA challenge was one of the places where you could apply those algorithms. All of my friends were moving to Silicon Valley then. Bobby was at that time a post-doc at Berkeley working with Stuart Russell and others, and so I thought I should take this McKinsey offer in San Francisco. It was a way to be close to Silicon Valley and to be close to where some of the action, including the DARPA challenge, was taking place.
MARTIN FORD: What is your role now at McKinsey?
JAMES MANYIKA: I’ve ended up doing two kinds of things. One is working with many of the pioneering technology companies in Silicon Valley, where I have been fortunate to work with and advise many founders and CEOs. The other part, which has grown over time, is leading research at the intersection of technology and its impact on business and the economy. I’m the chairman of the McKinsey Global Institute, where we research not just technology but also macroeconomic and global trends to understand their impact on business and the economy. We are privileged to have amazing academic advisors that include economists who also think a lot about technology’s impacts, people like Erik Brynjolffson, Hal Varian, and Mike Spence, the Nobel laureate, and even Bob Solow in the past.
To link this back to AI, we’ve been looking a lot at disruptive technologies, and tracking the progress of AI, and I’ve stayed in constant dialogues as well as collaborated with AI friends like Eric Horvitz, Jeff Dean, Demis Hassabis, and Fei-Fei Li, and also learning from legends like Barbara Grosz. While I’ve tried to stay close to the technology and the science, my MGI colleagues and I have spent more time thinking about and researching the economic and business impacts of these technologies.
MARTIN FORD: I definitely want to delve into the economic and job market impact, but let’s begin by talking about AI technology.
You mentioned that you were working on neural networks way back in the 1990s. Over the past few years, there’s been this explosion in deep learning. How do you feel about that? Do you see deep learning as the holy grail going forward, or has it been overhyped?
JAMES MANYIKA: We’re only just discovering the power of techniques such as deep learning and neural networks in their many forms, as well as other techniques like reinforcement learning and transfer learning. These techniques all still have enormous headroom; we’re only just scratching the surface of where they can take us.
Deep learning techniques are helping us solve a huge number of particular problems, whether it’s in image and object classification, natural language processing or generative AI, where we predict and create sequences and outputs whether its speech, images, and so forth. We’re going to make a lot of progress in what is sometimes called “narrow AI,” that is, solving particular areas and problems using these deep learning techniques.
In comparison, we’re making slower progress on what is sometimes called “artificial general intelligence” or AGI. While we’ve made more progress recently than we’ve done in a long time, I still think progress is going to be much, much slower towards AGI, just because it involves a much more complex and difficult set of questions to answer and will require many more breakthroughs.
We need to figure out how to think about problems like transfer learning, because one of the things that humans do extraordinarily well is being able to learn something, over here, and then to be able to apply that learning in totally new environments or on a previously unencountered problem, over there. There are definitely some exciting new techniques coming up, whether in reinforcement learning or even simulated learning—the kinds of things that AlphaZero has begun to do—where you self-learn and self-create structures, as well start to solve wider and different categories of challenges, in the case of AlphaZero different kinds of games. In another direction the work that Jeff Dean and others at Google Brain are doing using AutoML is really exciting. That’s very interesting from the point of helping us start to make progress in machines and neural networks that design themselves. These are just a few examples. One could say all of this progress is nudging us towards AGI. But these are really just small steps; much, much more is needed, there are whole areas of high-level reasoning etc. that we barely know how to tackle. This is why I think AGI is still quite a long way away.
Deep learning is certainly going to help us with narrow AI applications. We’re going to see lots and lots and lots and lots of applications that are already being turned into new products and new companies. At the same time, it’s worth pointing out that there are still some practical limitations to the use and application of machine learning, and we have pointed this out in some of our MGI work.
MARTIN FORD: Do you have any examples?
JAMES MANYIKA: For example, we know that many of these techniques still largely rely on labelled data, and there’s still lots of limitations in terms of the availability of labelled data. Often this means that humans must label underlying data, which can be a sizable and error-prone chore. In fact, some autonomous vehicle companies are hiring hundreds of people to manually annotate hours of video from prototype vehicles to help train the algorithms. There are some new techniques that are emerging to get around the issue of needing labeled data, for example, in-stream supervis
ion pioneered by Eric Horvitz and others; the use of techniques like Generative Adversarial Networks or GANs, which is a semi-supervised technique through which usable data can be generated in a way that reduces the need for datasets that require labeling by humans.
But then we still have a second challenge of needing such large and rich data sets. It is quite interesting that you can more or less identify those areas that are making spectacular progress simply by observing which areas have access to a huge amount of available data. So, it is no surprise that we have made more progress in machine vision than in other applications, because of the huge volume of images and now video being put on the internet every day. Now, there are some good reasons—regulatory, privacy, security, and otherwise—that may limit data availability to some extent. And this can also, in part, explain why different societies are going to experience differential rates of progress on making data available. Countries with large populations, naturally, generate larger volumes of data, and different data use standards may make it easier to access large health data sets, for example, and use that to train algorithms. So, in China you might see more progress in using AI in genomics and “omics” given larger available data sets.
So, data availability is a big deal and may explain why some areas of AI applications take off much faster in some places than others. But we’ve also got other limitations to deal with, like we still don’t have generalized tools in AI and we still don’t know how to solve general problems in AI. In fact, one of the fun things, and you may have seen this, is that people are now starting to define new forms of what used to be the Turing test.
MARTIN FORD: A new Turing Test? How would that work?
JAMES MANYIKA: Steve Wozniak, the co-founder of Apple, has actually proposed what he calls the “coffee test” as opposed to Turing tests, which are very narrow in many respects. A coffee test is kind of fun: until you get a system that can enter an average and previously unknown American home and somehow figure out how to make a cup of coffee, we’ve not solved AGI. The reason why that sounds trivial but at the same time quite profound is because you’re solving a large number of unknowable and general problems in order to make that cup of coffee in an unknown home, where you don’t know where things are going to be, what type of coffee maker it is or other tools they have, etc. That’s very complex generalized problem-solving across numerous categories of problems that the system would have to do. Therefore, it may be that we need Turing tests of that form if you want to test for AGI, and maybe that’s where we need to go.
I should point out the other limitation, which is the question of potential issues not so much in the algorithm, but in the data. This is a big question which tends to divide the AI community. One view is the idea that these machines are probably going to be less biased than humans. You can look at multiple examples, such as human judges and bail decisions where using an algorithm could take out many of the inherent human biases, including human fallibility and even time of day biases. Hiring and advancement decisions could be another similar area like this, thinking about Marianne Bertrand and Sendhil Mullainathan’s work looking at the difference in calls back received by different racial groups who submitted identical resumes for jobs.
MARTIN FORD: That’s something that has come up in a number of the conversations I’ve had for this book. The hope should be that AI can rise above human biases, but the catch always seems to be that that the data you’re using to train the AI system encapsulates human bias, so the algorithm picks it up.
JAMES MANYIKA: Exactly, that’s the other view of the bias question that recognizes that the data itself could actually be quite biased, both in its collection, the sampling rates—either through oversampling or undersampling—and what that means systematically, either to different groups of people or different kinds of profiles.
The general bias problem has been shown in quite a spectacular fashion in lending, in policing and criminal justice cases, and so in any dataset that we have want to use, we could have large-scale biases already built it, many likely unintended. Julia Angwin and her colleagues at ProPublica have highlighted such biases in their work, as has MacArthur Fellow Sendhil Mullainathan and his colleagues. One of the most interesting findings to come out of that work, by the way, is that algorithms may be mathematically unable to satisfy different definitions of fairness at the same time, so deciding how we will define fairness is becoming a very important issue.
I think both views are valid. On the one hand, machine systems can help us overcome human bias and fallibility, and yet on the other hand, they could also introduce potentially larger issues of their own. This is another important limitation we’re going to need to work our way through. But here again we are starting to make progress. I am particularly excited about the pioneering work that Silvia Chiappa at DeepMind is doing using counterfactual fairness and causal model approaches to tackle fairness and bias.
MARTIN FORD: That’s because the data directly reflects the biases of people, right? If it’s collected from people as they’re behaving normally, using an online service or something, then the data is going to end up reflecting whatever biases they have.
JAMES MANYIKA: Right, but it can actually be a problem even if individuals aren’t necessarily biased. I’ll give you an example where you can’t actually fault the humans per se, or their own biases, but that instead shows us how our society works in ways that create these challenges. Take the case of policing. We know that, for example, some neighborhoods are more policed than others and by definition, whenever neighborhoods are more policed, there’s a lot more data collected about those neighborhoods for algorithms to use.
So, if we take two neighborhoods, one that is highly policed and one that is not—whether deliberately or not—the fact is that the data sampling differences across those two communities will have an impact on the predictions about crime. The actual collection itself may not have shown any bias, but because of oversampling in one neighborhood and undersampling in another, the use of that data could lead to biased predictions.
Another example of undersampling and oversampling can be seen in lending. In this example, it works the other way, where if you have a population that has more available transactions because they’re using credit cards and making electronic payments, we have more data about that population. The oversampling there actually helps those populations, because we can make better predictions about them, whereas if you then have an undersampled population, because they’re paying in cash and there is little available data, the algorithm could be less accurate for those populations, and as a result, more conservative in choosing to lend, which essentially biases the ultimate decisions. We have this issue too in facial recognitions systems which has been demonstrated in the work of Timnit Gebru, Joy Buolamwini, and others.
It may not be the biases that any human being has in developing the algorithms, but the way in which we’ve collected the data that the algorithms are trained on that introduces bias.
MARTIN FORD: What about other kinds of risks associated with AI? One issue that’s gotten a lot of attention lately is the possibility of existential risk from superintelligence. What do you think are the things we should legitimately worry about?
JAMES MANYIKA: Well, there are lots of things to worry about. I remember a couple of years ago, a group of us, that included many of the AI pioneers and other luminaries, including the likes of Elon Musk and Stuart Russell, met in Puerto Rico to discuss progress in AI as well concerns and areas that needed more attention. The group ended up writing about what some of the issues are, in a paper that was published by Stuart Russell, and what we should worry about, and pointing out where there was not enough attention and research going into analyzing these areas. Since that meeting, the areas to worry about have begun to change a little bit in the last couple of years, but those areas included everything—including things like safety questions.
Here is one example. How do you stop a runaway algorithm? How do you stop a runaway m
achine that gets out of control? I don’t mean in a Terminator sense, but even just in the narrow sense of an algorithm that is making wrong interpretations, leading to safety questions, or even simply upsetting people. For this we may need what has been referred to as the Big Red Button, something several research teams are working on DeepMind’s work with gridworlds, for example, has demonstrated that many algorithms could theoretically learn how to turn off their own “off-switches”.
Another issue is explainability. Here, explainability is a term used to discuss the problem that with neural networks: we don’t always know which feature or which dataset influenced the AI decision or prediction, one way or the other. This can make it very hard to explain an AI’s decision, to understand why it might be reaching a wrong decision. This can matter a great deal when predictions and decisions have consequential implications that may affect lives for example when AI is used in criminal justice situations or lending applications, as we’ve discussed. Recently, we’ve seen new techniques to get at the explainability challenge emerge. One promising technique is the use of Local-Interpretable-Model Agnostic Explanations, or LIME. LIME tries to identify which particular data sets a trained model relies on most to make a prediction. Another promising technique is the use of Generalized Additive Models, or GAMs. These use single feature models additively and therefore limit interactions between features, and so changes in predictions cane be determined as features are added.
Yet another area we should think about more is the “detection problem,” which is where we might find it very hard to even detect when there’s malicious use of an AI system—which could be anything from a terrorist to a criminal situation. With other weapons systems, like nuclear weapons, we have fairly robust detection systems. It’s hard to set off a nuclear explosion in the world without anybody knowing because you have seismic tests, radioactivity monitoring, and other things. With AI systems, not so much, which leads to an important question: How do we even know when an AI system is being deployed?