Digital Transformation

Page 12

by Thomas M Siebel

Inadequate computing power was just one limitation the early AI practitioners faced. A second core issue was that the underlying mathematical concepts and techniques were not well developed. Some of the early work in AI in the 1960s focused on advanced algorithmic techniques such as neural nets. But these ideas did not progress very far. For example, Minsky and Seymour Papert coauthored the book Perceptrons in 1969.16 Today the book is considered by some a foundational work in the field of artificial neural networks—now a widely used AI algorithmic technique. However, other practitioners at the time interpreted the book as outlining key limitations of these techniques. In the 1970s, the direction of AI research shifted to focus more on symbolic reasoning and systems—ideas that proved unsuccessful in unlocking economic value.

The Winter of AI

By the mid-1970s, many funding agencies started to lose interest in supporting AI research. The AI research efforts over the past decade had delivered some significant theoretical advancements—including Back-Propagation17,18 to train neural networks. But there were few tangible applications beyond some rudimentary examples. Areas that had been promised by AI researchers, such as understanding speech, or autonomous vehicles, had not advanced significantly. Reports, such as the Lighthill Report commissioned by the UK Government, were critical of AI.19

Following the initial burst of AI research and activity in the 1960s and early 1970s, interest in AI started to dwindle.20 Computer science practitioners started to focus on other, more rewarding areas of work, and AI entered a quiet period often referred to as “the first AI winter.”

AI made a brief resurgence in the 1980s, with much of the work focused on helping machines become smarter by feeding them rules. The idea was that given enough rules, machines would then be able to perform specific useful tasks—and exhibit a sort of emergent intelligence. The concept of “expert systems” evolved and languages like LISP were used to more effectively encode logic.21 The idea behind an expert system was that the knowledge and understanding of domain experts in different fields could be encoded by a computer program based on a set of heuristic rules.

The concept held the promise that computers could learn from occupational experts (the best doctors, firefighters, lawyers, etc.), encode their knowledge in the expert system, and then make it available to a much broader set of practitioners, so they could benefit from the understanding of the best of their peers.

These systems achieved some initial commercial success and applications in industry. Ultimately, however, none of the expert systems were effective and the promises seemed far ahead of the technical realities. Expert systems were based on a set of explicitly defined rules or logical building blocks—and not a true learning system that could adapt with changing data. Knowledge acquisition costs were high, since these systems had to get their information from domain experts. And they were also expensive to maintain since rules would have to be modified over time. The machines could not easily learn and adapt to changing situations. By the late 1980s, AI had fallen into a second winter.

The AI Renaissance

The field of AI was reinvigorated in the 2000s, driven by three major forces. First was Moore’s Law in action—the rapid improvement of computational power. By the 2000s, computer scientists could leverage dramatic improvements in processing power, reductions in the form factor of computing (mainframe computers, minicomputers, personal computers, laptop computers, and the emergence of mobile computing devices), and the steady decline in computing costs.

Second, the growth of the internet resulted in a vastly increased amount of data that was rapidly available for analysis. Internet companies Google, Netflix, and Amazon had access to data from millions to billions of consumers—their search queries, click-throughs, purchases, and entertainment preferences. These companies needed advanced techniques to both process and interpret the vast amount of available data, and to use these techniques to improve their own products and services. AI was directly aligned with their business interests. The internet also enabled the ubiquitous availability of compute resources through the emergence of cloud computing. As we’ve discussed in chapter 4, inexpensive compute resources were now available in the public cloud—elastic and horizontally scalable. That is, companies could make use of all the computing power they needed, when they needed it.

Third, significant advances in the mathematical underpinnings of AI were made in the 1990s and continued into the 2000s along with the successful implementation of those techniques. A key breakthrough was in the advancement of the subfield of AI called machine learning, or statistical learning. Important contributions came from researchers then at AT&T Bell Labs—Tin Kam Ho, Corinna Cortes, and Vladimir Vapnik—who created new techniques in applying statistical knowledge to develop and train advanced algorithms.

Researchers were able to develop mathematical techniques to convert complex nonlinear problems to linear formulations with numerical solutions—and then apply the increased available computational power of the elastic cloud to solve these problems. Machine learning accelerated as practitioners rapidly addressed new problems and built a family of advanced algorithmic techniques.

Some of the earliest machine learning use cases involved consumer-facing applications driven by companies like Google, Amazon, LinkedIn, Facebook, and Yahoo! Machine learning practitioners at these companies applied their skills to improve search engine results, advertisement placement and click-throughs, and advanced recommender systems for products and offerings.

Open Source AI Software

Many of the machine learning practitioners from these companies, as well as many in the academic community, embraced the “open source” software model—in which contributors would make their source code (for core underlying technical capabilities) freely available to the broader community of scientists and developers—with the idea these contributions would encourage the pace of innovation for all. The most famous of these open source code repositories is the Apache Software Foundation.

At the same time, Python started to emerge as the machine learning programming language of choice—and a significant share of the source code contributions included Python libraries and tools. Many of the most important libraries used today started to emerge as the open source standard.

By the mid-2000s, machine learning had started to make its way into other industries. Financial services and retail were some of the earliest industries to start to leverage machine learning techniques. Financial services firms were motivated by the large scale of data available from transaction processing and e-commerce and started to address use cases such as credit card fraud. Retail companies used machine learning technologies to respond to the rapid growth of e-commerce and the need to keep up with Amazon.

The open source movement was, and continues to be, an important factor in making AI commercially viable and ubiquitous today. The challenge for organizations trying to apply AI is how to harness these disparate open source components into enterprise-ready business applications that can be deployed and operated at scale. Many organizations try to build AI applications by stitching together numerous open source components, an approach that is unlikely to result in applications that can be deployed and maintained at scale. I will outline the complications of that approach in more detail in chapter 10, and describe how an alternative approach addresses the problem.

Deep Learning Takes Off

In the mid-2000s, another AI technology started to gain traction—neural networks, or deep learning. This technique employs sophisticated mathematical methods to make inferences from examples. Broad applications of deep neural networks were enabled by the efforts of scientists such as Yann LeCun at New York University, Geoffrey Hinton at the University of Toronto, and Yoshua Bengio at the Université de Montréal—three of the most prominent researchers and innovators in areas like computer vision and speech recognition.

The field of deep learning started to accelerate rapidly around 2009 because of improvements in hardware and the
ability to process large amounts of data. In particular, researchers started using powerful GPUs to train deep learning neural nets—which allowed researchers to train neural nets roughly 100 times faster than before. This breakthrough made the application of neural nets much more practical for commercial purposes.

AI has greatly evolved from the use of symbolic logic and expert systems (in the ’70s and ’80s), to machine learning systems in the 2000s, and to neural networks and deep learning systems in the 2010s.

Neural networks and deep learning techniques are currently transforming the field of AI, with broad applications across many industries: financial services (fraud detection; credit analysis and scoring; loan application review and processing; trading optimization); medicine and health care (medical image diagnostics; automated drug discovery; disease prediction; genome-specific medical protocols; preventive medicine); manufacturing (inventory optimization; predictive maintenance; quality assurance); oil and gas (predictive oilfield and well production; well production optimization; predictive maintenance); energy (smart grid optimization; revenue protection); and public safety (threat detection). These are just some of the hundreds of current and potential use cases.

The Overall Field of AI Today

AI is a broad concept with several key subfields and the overall taxonomy of the space can be confusing. One of the key distinctions is the difference between artificial general intelligence (AGI) and AI.

AGI—that I view as primarily of interest to science fiction enthusiasts—is the idea that computer programs, like humans, can exhibit broad intelligence and reason across all domains. AGI does not seem achievable in the foreseeable future, nor is it relevant to real-world AI applications. It is clear that in any given field we will see the development of AI applications that can outperform humans at some specific task. An IBM computer defeated Garry Kasparov at chess in 1996. Google DeepMind can defeat a Go champion. AI techniques can target a laser and read a radiograph with greater accuracy than a human. I believe it is unlikely that we will see AI applications, however, that can perform all tasks better than a human anytime soon. The computer program that can play chess, play Go, drive a car, target a laser, diagnose cancer, and write poetry is, in my opinion, not a likely development in the first half of this century.

FIGURE 6.1

AI, as I use the term throughout this book, is the area relevant to business and government because it relates to practical applications of artificial intelligence—the applications that you, as a business or government leader, will want to harness for your organization. This is the idea that computer programs can be trained to reason and solve specific dedicated tasks. For example, AI algorithms able to optimize inventory levels, predict customer churn, predict potential equipment failure, or identify fraud. As we’ve discussed, this field of AI has advanced rapidly in the last couple of decades.

While the different AI subfields fall into three broad categories—machine learning, optimization, and logic—the most exciting and powerful advances are happening in machine learning.

Machine Learning

Machine learning is a subfield of AI based on the idea that computers can learn from data without being explicitly programmed. Machine learning algorithms employ various statistical techniques on the data they are fed in order to make inferences about the data. The algorithms improve as the amount of data they are fed increases and as the inferences they generate are either confirmed or disconfirmed (sometimes by humans, sometimes by machines). For example, a machine learning algorithm for detecting fraud in purchase transactions becomes more accurate as it is fed more transaction data and as its predictions (fraud, not fraud) are evaluated as correct or incorrect.

Machine learning has been central to driving the recent growth of AI. It has proven its ability to unlock economic value by solving real-world problems—enabling useful search results, providing personalized recommendations, filtering spam, predicting failure, and identifying fraud, to name just a few.22

Machine learning is a broad field that includes a range of different techniques described in the following section.

Supervised and Unsupervised Learning

There are two main subcategories of machine learning techniques—supervised learning and unsupervised learning.

Supervised learning techniques require the use of training data in the form of labeled inputs and outputs. A supervised learning algorithm employs sophisticated statistical techniques to analyze the labeled training data, in order to infer a function that maps inputs to outputs. When sufficiently trained, the algorithm can then be fed new input data it has not seen before, and generate answers about the data (i.e., outputs) by applying the inference function to the new inputs.

For example, a supervised learning algorithm to predict if an engine is likely to fail can be trained by feeding it a large set of labeled inputs—such as historical operating data (e.g., temperature, speed, hours in use, etc.)—and labeled outputs (failure, nonfailure) for many cases of both engine failure and nonfailure. The algorithm uses these training data to develop the appropriate inference function to predict engine failure for new input data it is given. The objective of the algorithm is to predict engine failure with an acceptable degree of precision. The algorithm can improve over time by automatically adjusting its inference function based on feedback about the accuracy of its predictions. In this case, feedback is automatically generated based on whether the failure occurred or not. In other cases, feedback can be human-generated, as with an image classification algorithm where humans evaluate the prediction results.

There are two main categories of supervised learning techniques. The first is classification techniques. These predict outputs that are specific categories—such as whether an engine will fail or not, whether a certain transaction represents fraud or not, or whether a certain image is a car or not. The second category is regression techniques. These predict values—such as a forecast of sales over the next week. In the case of forecasting sales over the next week, an oil company might employ an algorithm trained by being fed historical sales data and other relevant data such as weather, market prices, production levels, GDP growth data, etc.

In contrast to supervised learning techniques, unsupervised techniques operate without “labels.” That is, they are not trying to predict any specific outcomes. Instead, they attempt to find patterns within data sets. Examples of unsupervised techniques include clustering algorithms—which try to group data in meaningful ways, such as identifying retail bank customers who are similar and therefore may represent new segments for marketing purposes—or anomaly detection algorithms, which define normal behavior in a data set and identify anomalous patterns, such as detecting banking transaction behavior that could indicate money laundering.

Neural Networks

Neural networks—and deep neural networks in particular—represent a newer and rapidly growing category of machine learning algorithms. In a neural network, data inputs are fed into the input layer, and the output of the neural network is captured in the output layer. The layers in the middle are hidden “activation” layers that perform various transformations on the data to make inferences about different features of the data. Deep neural nets typically have multiple (more than two or three) hidden layers. The number of required layers generally (but not always) increases with the complexity of the use case. For example, a neural net designed to determine whether an image is a car or not would have fewer layers than one designed to label all the different objects in an image—e.g., a computer vision system for a self-driving car, able to recognize and differentiate road signs, traffic signals, lane lines, cyclists, and so on.

In 2012, a neural network called AlexNet won the ImageNet Large Scale Visual Recognition Challenge, a contest to classify a set of several million images that had been preclassified by humans into 1,000 categories (including 90 dog breeds). AlexNet correctly identified images 84.7 percent of the time, with an error rate of only 15.3 percent. This was more than
10 percent better than the next system—a remarkably superior result. Deep learning techniques for image processing have continued to advance since AlexNet, achieving accuracy rates of greater than 95 percent—better than the performance of a typical human.23

Organizations in multiple industries are applying deep learning techniques using neural networks to a range of problems with impressive results. In the utilities sector, neural networks are applied to minimize “non-technical loss,” or NTL. Globally, billions of dollars are lost each year to NTL as a result of measurement and recording errors, electricity theft from tampering with or bypassing meters, unpaid bills, and other related losses. By reducing a utility’s NTL, these AI applications help ensure a more reliable electricity grid and significantly more efficient electricity pricing for customers.

One of the major advantages of using neural networks is the reduction or elimination of feature engineering, a time-consuming requirement when using traditional machine learning algorithms. Neural networks are capable of learning both the output and the relevant features from the data, without the need for extensive feature engineering. However, they usually require a very large amount of training data and are computationally intensive. This is why the use of GPUs has proven crucial for the success of neural networks.

Overcoming the Challenges of Machine Learning

For many AI use cases, organizations can deploy prebuilt, commercially available SaaS applications without having to develop the applications themselves. These include applications for predictive maintenance, inventory optimization, fraud detection, anti–money laundering, customer relationship management, and energy management, among others. In addition to deploying prebuilt SaaS applications, most large organizations will need to develop their own AI applications specifically tailored to their particular needs.

‹ Prev Next ›