Digital Transformation
Page 11
While managing data from multiple systems is by itself a complex undertaking, the challenge gets significantly harder with the sensoring of value chains and the resulting inflow of real-time data. Covered in more depth in chapter 7, this phenomenon has accelerated and resulted in a profusion of high-frequency data. These data are seldom useful by themselves. To yield value, they need to be combined with other data.
For example, readings of the gas exhaust temperature for an offshore low-pressure compressor are only of limited value in monitoring the state of that particular asset. However, these readings are far more useful when correlated in time with ambient temperature, wind speed, compressor pump speed, a history of previous maintenance actions, maintenance logs, and other data. A valuable use case would be to monitor anomalous states of gas exhaust temperature across a portfolio of 1,000 compressors in order to send alarms to the right operator at the right offshore rig—which requires a simultaneous understanding of high-frequency sensor readings, external data like weather conditions, associations between raw data and the assets from which those data are drawn, and workforce logs describing who works at each rig at each point in time.
Building an application to support the use case above requires jointly supporting abilities to rapidly retrieve time series data (typically using a distributed key-value store—a specialized type of database); search and sort workforce logs and offshore asset tags (typically using a relational database—often in separate systems); and alert workers (typically through enterprise software applications or commonly available communication tools).
3. Working with data lakes (or data swamps)
In the early 2000s, engineers at Yahoo! built a distributed storage and computational framework designed to scale in a massively parallel fashion. The Hadoop Distributed File System (HDFS) and the Hadoop MapReduce framework went through a wave of enterprise adoption over the next 10 to 15 years—promoted by companies that attempted to commercialize these technologies, such as Hortonworks, Cloudera, and MapR. The Apache Software Foundation has also supported related projects such as Apache Pig, Apache Hive, and Apache Sqoop—all designed independently to support the adoption and interoperability of HDFS. The promise of HDFS was a scalable architecture to store virtually all an enterprise’s data—irrespective of form or structure—and a robust way to analyze it using querying and analytic frameworks.
However, the corporate adoption of Hadoop technology remains low.16 Over 50 percent of corporate IT leaders do not prioritize it. Of those that do, 70 percent had fewer than 20 users in their organization. Technological challenges, implementation challenges, and deployment challenges all played a part in limiting the widespread adoption of Hadoop in the enterprise. In reality, storing large amounts of disparate data by putting it all in one infrastructure location does not reduce data complexity any more than letting data sit in siloed enterprise systems. For AI applications to extract value from disparate data sets typically requires significant manipulation such as normalizing and deduplicating data—capabilities lacking in Hadoop.
4. Ensuring data consistency, referential integrity, and continuous downstream use
A fourth big data challenge for organizations is to represent all existing data as a unified, federated image. Keeping this image updated in real time and updating all “downstream” analytics that use these data seamlessly is still more complex. Data arrival rates vary by system; data formats from source systems can change; and data arrive out of order due to networking delays. More nuanced is the choice of which analytics to update and when, in order to support a business work flow.
Take the case of a telecommunications provider wanting to predict churn for an individual cell phone customer. The unified view of that customer and the associated data update frequencies could look like this:
Data Set
Data Frequency
Call, text, and data volumes and metadata
every second
Network strength for every call placed
every few minutes
Number of bad call experiences historically
every few minutes
Number of low data bandwidth experiences historically
every few minutes
Density/congestion at cell towers used by customer
every few minutes
Ongoing billing
once a day (at least)
Time since last handset upgrade
once a day
Telco app usage and requests
every few days
Published billing
every month
Calls to the call center and their disposition
varies
Visits to the store and their disposition
varies
Visits to the customer service website
varies
Visits to “How do I stop service?” web page or in-app
varies
Calls to a competitor customer service line
varies
Strength/share of in-network calls and texts
every month
Products and services procured
every few months
Customer relationship details
every few months
Third-party customer demographics
every few months
The variance in data arrival frequency is significant. Data errors can further complicate things: If, say, the call center system logging a dissatisfied customer’s complaint somehow misrepresents the customer ID, that record is unusable. More critically, if the churn prediction model subsequently used compound aggregates on these data (e.g., the cumulative count of the number of calls to the call center in the last six months within 24 hours of a low bandwidth event and/or a dropped call and with a disposition of “negative sentiment”)—keeping these compound aggregates updated can result in an enormous computational burden or out-of-date analytics.
Enterprises have to understand and plan to solve all these challenges as they digitally transform and embed AI into their operations. They will need the right tools to enable seamless data integration at varying frequencies, ensure referential integrity of the data, and automatically update all analytics that depend on these frequently changing data sets.
5. Enabling new tools and skills for new needs
As the availability and access to data within an enterprise grow, the skills challenge grows commensurately. For example, business analysts accustomed to using tools like Tableau—a popular data visualization software application for creating reports with graphs and charts—will need to now build machine learning models to predict business key performance indicators (KPIs) instead of just reporting on them. In turn, their managers, with decades of proficiency in spreadsheet tools, now need new skills and tools to verify their analysts’ work in making those predictions.
Enterprise IT and analytics teams need to provide tools that enable employees with different levels of data science proficiency to work with large data sets and perform predictive analytics using a unified data image. These include drag-and-drop tools for novice users and executives; code-light tools for trained business analysts; integrated development environments for highly skilled data scientists and application developers; and data integration and maintenance tools for data engineers and integration architects working behind the scenes to keep the data image updated.
Big Data and the New Technology Stack
Successful digital transformation hinges critically on an organization’s ability to extract value from big data. While big data’s management demands are complex, the availability of next-generation technology gives organizations the tools they need to solve these challenges. In chapter 10, I will describe in more depth how this new technology stack addresses big data management capabilities. With that foundational capability in place, organizations will be able to unleash the transformative power of artificial intelligence—the subject of the next chapter.
Chapter 6
The AI Renaissance
C loud co
mputing and big data, which we examined in the preceding two chapters, represent respectively the infrastructure and the raw material that make digital transformation possible. In this chapter and the following one, we now turn to the two major technologies that leverage cloud computing and big data to drive transformative change—artificial intelligence and the internet of things. With AI and IoT, organizations can unlock tremendous value, reinvent how they operate, and create new business models and revenue streams.
Advances in AI have dramatically accelerated in recent years. In fact, AI has progressed to such an extent that it is hard to overstate its potential to drive step-function improvements in virtually every business process.
While the potential upside benefits are enormous, AI is admittedly a deep and complex subject, and most organizations will require the services of technology partners who can get them started and on their way. With the proper technology foundation and expert guidance, organizations that make investments today to harness the power of AI will position themselves for both short- and long-term competitive advantage. Conversely, those that fail to seize this opportunity are putting themselves at a severe disadvantage.
In this chapter, I provide an overview of AI, how it differs from traditional computer science that organizations have relied on for many decades, and how it is being applied across a range of use cases with impressive results. To better understand why there is such growing interest and investment in AI today, it is useful to know a bit about its history. I will touch on some of the highlights from its origins in the 1950s to the advances in recent years that today make AI an absolute imperative for every organization. I will also describe the significant challenges that AI presents and how organizations are overcoming those challenges.
A New Paradigm for Computer Science
Logic-based algorithms represent the core of traditional computer science. For decades, computer scientists were trained to think of algorithms as a logical series of steps or processes that can be translated into machine-understandable instructions and effectively used to solve problems. Traditional algorithmic thinking is quite powerful and can be used to solve a range of computer science problems in many areas—including data management, networking, search, etc.
Logic-based algorithms have delivered transformative value over the last 50 years in all aspects of business—from enterprise resource planning to supply chain, manufacturing, sales, marketing, customer service, and commerce. They have also changed how individuals communicate, work, purchase goods, and access information and entertainment. For example, the application you use to shop online employs numerous algorithms to perform its various tasks. When you search for a particular product by entering a term, the application runs an algorithm to find the products relevant to that term. Algorithms are used to compute taxes, offer you shipping options, process your payment, and send you a receipt.
Traditional logic-based algorithms effectively handle a range of different problems and tasks. But they are not effective at addressing many tasks that are often quite easy for humans to do. Consider a basic human task such as identifying an image of a cat. Writing a traditional computer program to correctly do this would involve developing a methodology to encode and parametrize all variations of cats—all different sizes, breeds, colors, and their orientation and location within the image field. While a program like this would be enormously complex, a two-year-old child can effortlessly recognize the image of a cat. And a two-year-old can recognize many objects beyond cats.
Similarly, many simple tasks for humans—such as talking, reading or writing a text message, recognizing a person in a photo, or understanding speech—are exceedingly difficult for traditional logic-based algorithms. For years these problems have plagued fields like robotics, autonomous vehicles, and medicine.
AI algorithms take a different approach than traditional logic-based algorithms. Many AI algorithms are based on the idea that rather than code a computer program to perform a task, design the program to learn directly from data. So instead of being written explicitly to identify pictures of cats, the computer program learns to identify cats using an AI algorithm derived by observing a large number of different cat images. In essence, the algorithm infers what an image of a cat is by analyzing many examples of such images, much as a human learns.
As discussed in previous chapters, we now have the techniques and computing capability to process all the data in very large data sets (big data) and to train AI algorithms to analyze those data. So wherever it is possible to capture sufficiently large data sets across their operations, organizations can transform business processes and customer experiences using AI—making possible the age of AI-driven digital transformation.
Just as the emergence of the commercial internet revolutionized business in the 1990s and 2000s, the ubiquitous use of AI will similarly transform business in the coming decades. AI already touches and shapes our lives today in many ways, and we are still in the infancy of this transition. Google, one of the first companies to embrace AI at scale, uses AI to power all dimensions of its business.1 AI already powers the core of Google’s business: search. The results of any Google search query are provided by an extremely sophisticated AI algorithm that is constantly maintained and refined by a large team of data scientists and engineers.2 Advertising, the core source of revenue for Google, is all driven by sophisticated, AI-backed algorithms—including ad placement, pricing, and targeting.
Google Assistant uses AI and natural language processing (NLP) to deliver sophisticated, speech-based interaction and control to consumers. Google’s parent company, Alphabet, has a self-driving car division called Waymo that already has cars on the streets. Waymo’s core technology—its self-driving algorithms—is powered by AI.
Other consumer-facing companies have similar offerings. Netflix uses AI to power movie recommendations. Amazon uses AI to provide product recommendations on its e-commerce platform, manage pricing, and offer promotions.3 And numerous companies, from Bank of America to Domino’s Pizza, use AI-powered “chat bots” in a variety of use cases including customer service and e-commerce.
While Google, Netflix, and Amazon are early adopters of AI for consumer-facing applications, virtually every type of organization—business-to-consumer, business-to-business, and government—will soon employ AI throughout their operations. The economic benefits will be significant. McKinsey estimates AI will increase global GDP by about $13 trillion in 2030, while a 2017 PwC study puts the figure at $15.7 trillion—a 14 percent increase in global GDP.
AI Is Not a New Idea
To understand why there is such heightened interest in AI today, it’s useful to retrace some of its history. Fascinating in its own right, AI’s evolution is an instructive lesson in how a few key innovations can catapult a technology into mainstream prominence.
The field of AI is not new. The earliest ideas of “thinking machines” arose in the 1950s, notably with British computer scientist and mathematician Alan Turing’s paper speculating about the possibility of machines that think. He posited the “Turing test” to establish a definition of thinking.4 To pass the Turing test, a computer would have to demonstrate behavior indistinguishable from that of a human.
The term “artificial intelligence” dates back to 1955, when young Dartmouth math professor John McCarthy coined the term as a neutral way to describe the emerging field.5 McCarthy and others proposed a 1956 summer workshop:
We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire.
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.
An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be
made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.6
That workshop is largely cited as the creation of AI as a field of research. A rapid explosion of university-led projects followed: MIT launched Project MAC (Mathematics and Computation) with funding from DARPA in 1963,7 Berkeley’s Project Genie started in 1964,8 Stanford launched its Artificial Intelligence Laboratory in 1963,9 and University of Southern California founded its Information Sciences Institute in 1972.10
Rapid interest in the field grew with the work of MIT’s Marvin Minsky—who established the MIT Computer Science and Artificial Intelligence Laboratory.11 Minsky and John McCarthy at MIT, Frank Rosenblatt at Cornell, Alan Newell and Herbert Simon at Carnegie Mellon, and Roger Schank at Yale were some of the early AI practitioners.
Based on some of the early work, an “AI buzz” set the world aflame in the 1960s and ’70s. Dramatic predictions flooded popular culture.12 Soon machines would be as smart or smarter than humans; they would take over tasks currently performed by humans, and eventually even surpass human intelligence. Needless to say, none of these dire predictions came to pass.
The early efforts by AI practitioners were largely unsuccessful and machines were unable to perform the simplest of tasks for humans. One key obstacle practitioners faced was the availability of enough computing power. Over the course of the 1960s, ’70s, and ’80s, computing evolved quite rapidly. But machines were still not powerful enough to solve many real-world problems. Over those decades, computers grew in power and shrank in size, evolving from the size of entire buildings, to mainframe computers, minicomputers, and personal computers.
One of the first commercially available IBM computers, the IBM 650 in 1954, cost $500,000 at the time, had memory of 2,000 10-digit words, and weighed over 900 kilograms.13 In contrast, the iPhone X, launched in 2017, cost $999, has a 64-bit A11 chip and three gigabytes of RAM, and fits in your pocket.14,15 This dramatic improvement in performance is a powerful testimony to Moore’s Law at work. The ubiquitously available, commodity computers of today are a factor of 1,000 more powerful than the machines available to Minsky and his colleagues.