Connectivity to the Cloud
Adoption of cloud services globally is also propelled by improved connectivity of the telecommunications industry. Network speeds around the world are increasing significantly thanks in part to fiber installations in cities and buildings. Average network speed throughout the U.S. is over 18 megabits per second (Mbps)—only tenth in the world. South Korea tops the list at almost 30 Mbps. Worldwide, the average speed is just over 7 Mbps, increasing 15 percent a year.14,15
Historically, fixed networks offered speeds and latencies superior to mobile networks. But continued innovation in mobile network technology—3G and 4G (third- and fourth-generation), and Long-Term Evolution (LTE)—has rapidly narrowed the performance gap between fixed and mobile networks. And worldwide demand for next-generation tablets and smartphones pushes carriers to invest in mobile networking infrastructure. Even higher speeds of 5G (fifth-generation) technology will further accelerate the adoption of cloud computing.
While 5G networks are only in the very early roll-out stages, and estimates about actual speeds abound, it’s clear they will be significantly faster than 4G. At the 2018 Consumer Electronics Show in Las Vegas, Qualcomm simulated what 5G speeds would be in San Francisco and Frankfurt. The Frankfurt demo showed download speeds greater than 490 Mbps for a typical user—compared with typical rates of just 20-35 Mbps over today’s 4G LTE networks. San Francisco was even faster: 1.4 Gbps (gigabits per second).
The key point for business and governmental leaders is that cloud computing technology and the infrastructure it relies on continue to improve and evolve at a rapid pace. Performance and scalability are getting better all the time—all the more reason to move to the cloud without delay.
Converting CapEx to OpEx
Public cloud IaaS growth of 30-35 percent per year over the last three years16 illustrates that businesses—particularly those undertaking digital transformations—are moving to the cloud for a variety of technical and financial reasons. As enterprises transition to elastic public clouds, they quickly realize the economic appeal of cloud computing, often described as “converting capital expenses to operating expenses” (CapEx to OpEx) through the pay-as-you-go SaaS, PaaS, and IaaS service models.17 Rather than tie up capital to buy or license depreciating assets like servers and storage hardware, organizations can instantly access on-demand resources in the cloud of their choice, for which they are billed in a granular fashion based on usage.
Utility-based pricing allows an organization to purchase compute-hours distributed non-uniformly. For example, 100 compute-hours consumed within an eight-hour period, or 100 compute-hours consumed within a two-hour period using quadruple the resources, costs the same. While usage-based bandwidth pricing has long been available in networking, it is a revolutionary concept for compute resources.
The absence of upfront capital expenses, as well as savings in the cost of personnel required to manage and maintain diverse hardware platforms, allows organizations to redirect this freed-up money and invest in their digital transformation efforts, such as adding IoT devices to monitor their supply chain or deploying predictive analytics for better business intelligence.
Additional Benefits of the Elastic Public Cloud
While cost, time, and flexibility advantages are the fundamental reasons to move to the elastic public cloud, there are other important benefits:
• Near-zero maintenance: In the public cloud, businesses no longer need to spend significant resources on software and hardware maintenance, such as operating system upgrades and database indexing. Cloud vendors do these for them.
• Guaranteed availability: In 2017, a major global airline suffered an outage because an employee accidentally turned off the power at its data center.18 Such unplanned downtime—due to things like operating system upgrade incompatibility, network issues, or server power outages—virtually vanishes in the public cloud. The leading public cloud providers offer availability guarantees. A 99.99 percent uptime availability, common in the industry, means less than one hour of downtime a year.19 It is nearly impossible for an in-house IT team to ensure that level of uptime for an enterprise operating globally.
• Cyber and physical security: With the public cloud, organizations benefit from cloud providers’ extensive investments in both physical and cyber security managed 24/7 to protect information assets. Public cloud providers continuously install patches for the thousands of vulnerabilities discovered every year and perform penetration testing to identify and fix vulnerabilities. Public cloud providers also offer compliance certification, satisfying local and national security and privacy regulations.
• Latency: Minimizing latency—the lag time between user action and system response—is critical to enabling real-time operations, great customer experiences, and more. The single biggest determinant is the round-trip time between an end user’s application (e.g., a web browser) and the infrastructure. Major public cloud providers offer multiple “availability zones”—i.e., physically isolated locations within the same geographic region, connected with low latency, high throughput, and highly redundant networking. For example, AWS spans 53 availability zones in 18 regions globally. With the public cloud, a game developer in Scandinavia, for example, can deploy a mobile application and provide best-in-class latency in every region worldwide without managing a fleet of far-flung data centers.
• Reliable disaster recovery: Today’s globally distributed public clouds ensure cross-region replication and the ability to restore to points in time for comprehensive, reliable disaster recovery. For instance, a business whose East Asian data center is impacted by a local political disruption could operate without any interruption from its replica in Australia. Similarly, if files are accidentally destroyed, cloud services allow businesses to restore back to a time when their systems were operating in a normal state. While it is technically possible for any business to set up, manage, and test its own replication and restore services, it would be prohibitively expensive for most organizations.
• Easier and faster development (DevOps): The shift to the cloud enables the new development methodology known as “DevOps” that is gaining widespread popularity and adoption. Software engineers traditionally developed applications on their local workstations but are steadily moving toward developing on the cloud. DevOps combines both software development (Dev) and IT operations (Ops) in much tighter alignment than was previously the case. The cloud gives developers a wider variety of languages and frameworks, up-to-date cloud-based development environments, and easier collaboration and support. With cloud-based containers, engineers can now write code in their preferred development environment that will run reliably in different production environments. All this increases the rate of developing and deploying software for production use.
• Subscription pricing: Cloud computing’s utility-based pricing has transitioned software pricing to a subscription model, allowing customers to pay only for their usage. Subscription models for SaaS, PaaS, and IaaS have been popularized in recent years, with pricing typically based on the number of users and compute resources consumed. In most cases, subscription pricing is proportional to different levels of software features selected. This allows businesses to pick and choose what they want, for however long they want, and for any number of users. Even small and medium-sized businesses can optimally access best-in-class software.
• Future-proofing: SaaS allows software producers to rapidly and frequently upgrade products, so customers always have the latest functionality. In the pre-cloud era, businesses often had to wait six months or more between release cycles to get the latest improvements, and rollout could be slow and error-prone. Now, with cloud-based SaaS, businesses continuously receive seamless updates and upgrades, and know they always operate with the newest version.
• Focusing on business, not on IT: In the era of software licenses, businesses had to maintain teams to manage on-premises hosting, software and hardware upgrades, security, performance tuning,
and disaster recovery. SaaS offerings free up staff from those tasks, allowing businesses to become nimble and focus on running the business, serving customers, and differentiating from competitors.
Computing without Limits
The elastic cloud has effectively removed limits on the availability and capacity of computing resources—a fundamental prerequisite to building the new classes of AI and IoT applications that are powering digital transformation.
These applications typically deal with massive data sets of terabyte and petabyte scale. Data sets of this size—particularly since they include a wide variety of both structured and unstructured data from numerous sources—present special challenges but are also the essential raw material that makes digital transformation possible. In the next chapter, I turn to the topic of big data in more depth.
Chapter 5
Big Data
As computer processing and storage capacity have increased, it has become possible to process and store increasingly large data sets. Much of the resulting discussion of big data focuses on the significance of that increase. But it’s only part of the story.
What’s most different about big data, in the context of today’s digital transformation, is the fact that we can now store and analyze all the data we generate—regardless of its source, format, frequency, or whether it is structured or unstructured. Big data capabilities also enable us to combine entire data sets, creating massive supersets of data that we can feed into sophisticated AI algorithms.
The quantification of information was first conceived of by Claude Shannon, the father of Information Theory, at Bell Labs in 1948. He conceived of the idea of the binary digit (or bit, as it came to be known), as a quantifiable unit of information. A bit is a “0” or a “1”. This invention was a prerequisite to the realization of the digital computer, a device that really does nothing more than add sequences of binary numbers—0s and 1s—at high speeds. If we need to subtract, the digital computer adds negative numbers. If we need to multiply, it adds numbers repeatedly. As complex as digital computers may seem, they are essentially nothing more than sophisticated adding machines.
Using base-2 arithmetic, we can represent any number. The ASCII encoding system, developed from telegraph code in the 1960s, enables the representation of any character or word as a sequence of zeros and ones.
As information theory developed and we began to amass increasingly large data sets, a language was developed to describe this phenomenon. The essential unit of information is a bit. A string of eight bits in a sequence is a byte. We measure computer storage capacity as multiples of bytes as follows:
One byte is 8 bits.
One thousand (1000) bytes is a kilobyte.
One million (10002) bytes is a megabyte.
One billion (10003) bytes is a gigabyte.
One trillion (10004) bytes is a terabyte.
One quadrillion (10005) bytes is a petabyte.
One quintillion (10006) bytes is an exabyte.
One sextillion (10007) bytes is a zettabyte.
One septillion (10008) bytes is a yottabyte.
To put this in perspective, all the information contained in the U.S. Library of Congress is on the order of 15 terabytes.1 It is not uncommon for large corporations today to house scores of petabytes of data. Google, Facebook, Amazon, and Microsoft collectively house on the order of an exabyte of data.2 As we think about big data in today’s computer world, we are commonly addressing petabyte- and exabyte-scale problems.
There are three essential constraints on computing capacity and the resulting complexity of the problem a computer can address. These relate to (1) the amount of available storage, (2) the size of the binary number the central processing unit (CPU) can add, and (3) the rate at which the CPU can execute addition. Over the past 70 years, the capacity of each has increased dramatically.
As storage technology advanced from punch cards, in common use as recently as the 1970s, to today’s solid-state drive (SSD) non-volatile memory storage devices, the cost of storage has plummeted, and the capacity has expanded exponentially. A computer punch card can store 960 bits of information. A modern SSD array can access exabytes of data.
The Intel 8008 processor is a relatively modern invention, introduced in 1972. It was an 8-bit processor, meaning it could add numbers up to 8 bits long. Its CPU clock rate was up to 800 kilohertz, meaning it could add 8-bit binary numbers at rates up to 800,000 times per second.
A more modern processor—for example the NVIDIA Tesla V100 graphics processing unit (GPU)—addresses 64-bit binary strings that it can process at speeds up to 15.7 trillion instructions per second. These speeds are mind numbing.
The point of this discussion is that with these 21st-century advances in processing and storage technology—dramatically accelerated by the power of elastic cloud computing offered by AWS, Azure, IBM, and others—we effectively have infinite storage and computational capacity available at an increasingly low and highly affordable cost. This enables us to solve problems that were previously unsolvable.
How does this relate to big data? Due to the historical computing constraints described above, we tended to rely on statistically significant sample sets of data on which we performed calculations. It was simply not possible to process or even address the entire data set. We would then use statistics to infer conclusions from that sample, which in turn were constrained by sampling error and confidence limits. You may remember some of this from your college statistics class.
The significance of the big data phenomenon is less about the size of the data set we are addressing than the completeness of the data set and the absence of sampling error. With the computing and storage capacity commonly available today, we can access, store, and process the entire data set associated with the problem being addressed. This might, for example, relate to a precision health opportunity in which we want to address the medical histories and genome sequences of the U.S. population.
When the data set is sufficiently complete that we can process all the data, it changes everything about the computing paradigm, enabling us to address a large class of problems that were previously unsolvable. We can build highly accurate predictive engines that generate highly reliable predictive analytics. This in turn enables AI. That is the promise of big data.
As game-changing as it is, there are enormously complex challenges in managing big data and building and deploying large-scale AI and IoT applications fueled by big data. In this chapter, we discuss both the impact of big data in real-world applications and use cases that are driving digital transformation, as well as the significant challenges around harnessing big data. In order to get value from big data, it is clear that organizations will have to adopt new processes and technologies, including new platforms designed to handle big data.
With regard to big data, incumbent organizations have a major advantage over startups and new entrants from other sectors. Incumbents have already amassed a large amount of historical data, and their sizeable customer bases and scale of operations are ongoing sources of new data. Of course, there remain the considerable challenges of accessing, unifying, and extracting value from all these data. But incumbents begin with a significant head start.
To better understand what big data means today, it’s useful to briefly review how data technology has evolved over time—and how we got to where we are.
Computer Storage: A Brief History
The First Storage Device
The first recorded storage device is arguably a clay tablet found in the Mesopotamian city of Uruk. It dates to about 3300 B.C. and is now part of the British Museum’s collection.3 The tablet was a record of payment—in beer rations—to workers. Not only is it an early specimen of cuneiform writing but it is also an example of a recorded and stored piece of data for a particular transaction—that could be retrieved and copied to support or quash disagreements and legal disputes.
One can imagine a large warehouse in Mesopotamia holding all possible records of th
is nature to help bureaucrats enforce agreements. In fact, the Royal Library of Ashurbanipal (in Nineveh, located in today’s Iraq) was just this—a collection of 30,000 tablets including the famed Epic of Gilgamesh.4 The library was destroyed in 612 B.C., but many of the clay tablets survived to provide a wealth of data about Mesopotamian literature, religion, and bureaucracy.
Over the ages, the human need to store, retrieve, and manage data has continued to grow. The Great Library at Alexandria, established in the 3rd century B.C.—supposedly inspired by the Royal Library of Ashurbanipal—stored, at its zenith, between 400,000 and a million papyrus scrolls and parchments. The quest to gather these documents—covering mathematics, astronomy, physics, natural sciences, and other subjects—was so important that incoming ships were searched for new books. Information retrieval and copying had a huge cost—one had to pay a top-rated scribe 25 denarii ($3,125 in today’s dollars) to copy 100 lines.5,6
To our modern eye, we would recognize several more recent projects as precursors of big data. Scientists in Europe in the middle ages captured astronomical data so that by the time Copernicus hit his stride in the early 16th century, his heliocentric ideas could be based on the findings of previous generations.
A century later, Londoners John Graunt and William Petty used public records of bubonic plague deaths to develop a “life table” of probabilities of human survival. This is considered an early statistical model for census methods and a precursor to modern demography. Scientists like Antonie van Leeuwenhoek catalogued microscopic creatures, establishing the study of microbiology.
In the early 1800s, U.S. Navy officer Matthew Maury took advantage of his placement in the Depot of Charts and Implements to data mine decades of captain’s logs to create the revolutionary Wind and Current Chart of the North Atlantic Ocean, thereby transforming transatlantic seafaring.
Later in the century, the U.S. Census Bureau, facing the prospect of spending a decade to collect and collate data for the 1890 census, turned to a young inventor from MIT named Herman Hollerith for a solution. Using punch cards, the Hollerith Electric Tabulating Machine turned a 10-year project into a three-month one. Iterations of the machines were used until they were replaced by computers in the 1950s. Hollerith’s machine was one of the core inventions that formed the Computing-Tabulating-Recording Company in 1911, later renamed the International Business Machines Company (IBM).
Digital Transformation Page 9