Book Read Free

Info We Trust

Page 3

by R J Andrews


  Each one of us arrives to data stories with a slightly different map of reality. Nerdy expertise—the kind drilled into scientists, engineers, and designers—has serendipitously prepared some for the technical challenges of the information age. These disciplines gift valuable perspectives and skills. They are uncommon perspectives if one considers the rest of the population: Only six percent of United States workers are in science, technology, engineering, or mathematics occupations. The rare technical orientation of the nerd should not be confused with the attitude of the craft. Data storytelling does not arrive from peripheral obscurity. It is born out of the common everyday experiences that we all share. Data storytelling belongs to everyone.

  Numbered

  We each get 10 fingers, 10 digits. Because our minds are easily distracted, we use our digits to keep track when we count. Our fingers are a versatile tool for small quantities. They help serve as an easy visual reference for what the count is. But too soon we run out of fingers (and toes) and need to externalize the count beyond our bodies. Externalizing the count also keeps hands available for other tasks. We can scratch quick marks in the dirt to help us keep track of our counting, and just like that, the history of numerals began.

  Unary is the base-1 numeral system. The numbers 1, 2, 3, 4 are represented as: 1, 11, 111, 1111

  The word for 20, a score, evolved from the Old Norse skor, meaning “to cut”— or how one might scar tally marks into a counting stick. Counting in scores, perhaps more meaningful when shoes were less common, was already archaic by the time Lincoln alluded to 1776 by beginning his 1863 Gettysburg Address fourscore and seven years ago. We still tally game scores on scoreboards.

  The Ishango bone is a scarred baboon femur thought to be a 20,000-year-old tally stick.

  The first tally marks were scrawled in dirt with a stick or drawn on rock with a piece of charcoal. It soon became useful to preserve these marks for record-keeping and communication. Ancient knotted counting ropes and slashed animal bones survive as examples of preserved counts. In the beginning, every item of the count was represented by a mark. Six is //////. These marks persist in East Asian numeral systems as the first three counting numbers: . These slash-characters are also identical to the same first three numerals in the Brahmi numeral system, the direct graphic ancestor of the modern Hindu-Arabic numerals the world uses today: 1, 2, 3. To us, these familiar numerals are abstract symbols. Today, numerals are squiggly cultural conventions no longer connected to our physical surroundings. But a long time ago, they were.

  Tallying numbers becomes cumbersome as one counts higher. Large numbers, often multiples of 10, were abstracted with a new idea: sign values to represent a particular group. These symbols cemented our 10-fingered bodies as the base of the number system. If the number ten is † and hundred is ‡, then 114 can be recorded as ‡†////. Sign-value notation was the basis of Ancient Egyptian and Roman numeral systems. These systems yielded to a variety of additive systems which give special names to the first 10 digits (…four, five, six…) and important multiples of 10 (ten, hundred, thousand). These special names are still how we pronounce numbers in both Chinese and English today: two hundred (and) four.

  5, 6, 7, 8, 9

  At one time, there was speculation that figures past 4 had come from either the forms of initial letters or syllables of number words of the third century BC Brahmi alphabet. But they may have come from older, untraceable numerical symbols.

  JOSEPH MAZUR, 2014

  A symbol is a visual shape used as a conventional representation, or proxy, of an object or idea.

  Counting numbers struggled to account for expenses that take away more than available (i.e., debts). Another problem was that they could not clearly represent the concept of nothing. Over a thousand years, negative numbers and zero were added to address these issues. Solutions were first formalized in India, synthesized with Greek mathematics in Persia, and then slingshot through North Africa and into Europe by Leonardo fillius Bonacci (nicknamed Fibonacci hundreds of years later). The dominant mental picture of numbers shifted from a count of things to the more abstract number line. The positional notation of the Hindu-Arabic numerals made adding and subtracting easier. These new counting methods powered bank accounting innovations across Europe. Decimal fractions and ever-more abstract concepts, like imaginary numbers, would soon help power the scientific revolution and deliver us into the modern world.

  John Tukey advanced a compact base-10 tally system that built a box of dots and dashes with each additional count.

  The emergence of numbers shows how the visual memory practice of counting became hyper-externalized and abstracted to even greater benefit. As numbers evolved, they drifted away from physical reality. Today's everyday experience of numbers is surreal.

  It requires you to leave your physicality behind and mentally step into an abstract world. But this has not always been so. All of mathematics began with simple vignettes, such as prehistoric shepherds looking across their flocks, counting sheep. It is all rooted in our lived experience.

  Many codes … exist primarily to make life easier for machines and their designers without any consideration of the burden placed upon people.

  DON NORMAN, 2013

  Value types define how data is stored and impact the ways we turn numbers into information. Computer code often demands that certain aspects of value types be declared. Numbers and text can be quickly processed inside the computer's abstract world when they are labeled appropriately. Yet, we must appreciate even more than the computer if we are to build information. Observe the many value types expressed in this statement:

  The roar of the crowd swells as Joe Louis, the 198-¾-pound heavyweight, enters the arena for the final fight of the night, hopeful to exit the ring as the champion.

  In 1847 George Boole introduced the world to the truth values of Boolean logic and their main operations of AND, OR, and NOT with The Mathematical Analysis of Logic.

  A Boolean will record the win-loss outcome. Zero is false. A nonzero, usually one, is true. The heavyweight category is stored as a string of text. This particular category is ordinal because it can be positioned in order. There is a non-arbitrary relationship between weight classes: Heavyweight is heavier than middleweight, which itself is heavier than lightweight. The fighter's name, Louis, is also stored as a qualitative string of text. But it is considered just nominal, as there is no meaningful way of ordering fighter names.

  The floating point is able to hold values that arrive from the entire depth and breadth of the number line. It is called a floating point because the value can be re-expressed using scientific notation which moves, or floats, the decimal point.

  An integer, a quantitative non-fraction, counts the fights of the night. It is discrete, with no in-between states. A floating point records the fighter's weight. Floats are associated with how we perceive—and measure—the real physical world: a continuous spectrum that can be zoomed in on. Time can be split into seconds, milliseconds, nanoseconds, and so on. Space can also be sliced into ever-smaller fractions of length or degree. Recognizing value types is one foundation for better information because it helps you see the inherent structure in the data's origin.

  Enter Data

  A datum is a value stored in a location. The value could be of a variety of types, but is often a float, integer, Boolean, or text string. More than one datum makes data. Data is traditionally expressed as the plural of datum. But today we also refer to it as a singular mass noun, like sand or rain. Whether we say data are or data is, each datum includes a value and a storage location.

  Scalar data, such as temperature, has one value at each position. Vector data, such as air velocity's direction and magnitude, has two values at each position. As such, it is often represented by an arrow. Tensor data has many values at each position. One example of tensor data is how a stress-strain tensor can differentiate how a material will behave in the three dimensions of space.

  In some cases, the data v
alue's location is associated with an actual location in the real world. The location might be global, like a map coordinate, or local, like the position of a stent in a heart's coronary artery. In other cases, the data value's location is defined by reference keys and attribute names that have no relation to a real physical place. Data can also be characterized by how many values are stored for each location.

  Just as the value types of data differ, data storage types vary as well. Two-dimensional tables of data position values into neat rows and columns. Hierarchical trees, such as your hard drive's nested folders, stack relationships. Databases manage a variety of data and programs in one unified environment; they create flexible systems sometimes explained with “object” metaphors.

  Rote learning and drill is not enough. It leaves out understanding. … ideas and understanding are what [it] is centrally about.

  LAKOFF AND NÚÑEZ, 2000

  The diversity of data value types and data storage types combinesto help createour data, but they are often not enough. Modern data packages also contain metadata, such as summary values and data dictionaries. These metadata provide explanatory context for what the data values contain, how they relate to one another, and context for what it all means. Many datasets are complex and multilayered combinations. They may contain different structures and file formats. Nonetheless, simple mental models, like the relationship tree, table array, and spatial map, persist.

  It's not the numbers that are interesting. It's what they tell us about the lives behind the numbers.

  HANS ROSLING, 1995

  How do we picture data? We might imagine imperceptible strings of zeros and ones that go on forever, written by tiny machines to solid-state drives. Data lives far away on chilly server racks, ready to serve you at a moment's notice; it is backed up elsewhere, just in case. Data can also be a precious portable thumb drive, pursued by the characters in a Hollywood action film. When we see data in the currency of its medium of storage, we block the creative work we need to do. These impenetrable images of data do not help.

  A MacGuffin is something desired that helps advance a story's plot. The pursuit of the MacGuffin, not the MacGuffin itself, is what is important. The search for the Holy Grail motivated Arthurian legends, while the pursuit of the Maltese Falcon statue was at the heart of Dashiell Hammett's detective novel. Today, the search for a valuable cache of data, often made visible by its portable object of storage, propels action films forward. The most lovable MacGuffin might be Star Wars' R2-D2, the custodian of the stolen plans (i.e., data) that can save the people and restore freedom to the galaxy.

  The first lesson for data storytellers from James Brown's album is an easy one: The magic is in hearing the music, not the nuance of its capture and storage. The second lesson is that Live at the Apollo is not a perfect time capsule. It cannot be. As a sensory event, the album only transports us audibly to that room, and even then, only partially. It is not a total rote recording of that 1962 concert. It is merely a simplification, an encoding that reduced the sensory reality of that evening to a tiny fraction of its original, rich salience.

  But even a virtual reality experience that put us perfectly back at The Apollo would still not be the same as actually witnessing the show. This is because you would have a different frame of reference compared to any 1962 Harlem concert-goer. Furthermore, James Brown's recorded performance will not be motivated anew by audience cheers. Reality does not happen twice. Any recording is but a shadow of the performance. It is an incomplete artifact that lives on.

  The album is a sliver of what that night was, but that does not make it inferior. It is a treasure. The album is a beautiful compression of what that concert was. It helped James Brown rocket to success and still moves our feet today. No one would want to watch a continuous stream of someone's life, there is too much monotonous noise. But compress a life story into a two-hour film and you can move the emotions (and wallets) of millions.

  Storytellers of all stripes must regularly compress all of the possible information their stories could contain into a manageable number of relatable details.

  MICHAEL AUSTIN, 2010

  We often wish we could remember more. Russian neuropsychologist Aleksandr Luria treated a patient whose memory was too sharp for his own good. Referred to as S., he suffered from not having the “art of forgetting”—the automatic disposal of trivial detail as we push information from short-term memory to long-term storage. In When We Are No More, historian Abby Smith Rumsey relates the consequences of S.'s condition of being unable to forget triviality:

  S. suffered from a disorder of distraction. He could not make things dull, and had a hard time maintaining focus onanything for extended periods. He was unable to sort his impressions for value and emotional salience. To him the world was far too vivid far too much of the time. …

  He easily confused what he had remembered (because everything he encountered in his daily life triggered a chain of recollections) with what had actually transpired. Memories were so fresh in affect and spun out in his mind so rapidly that he mistook his recollections for reality. There were periods in his youth when he did not get up in the morning to go to school because even thinking about arising stimulated memories of having done so before. He thought that he had gone to school even as he lay still under the covers.

  Having only a compression—an impression, a model, a shadow—is actually the best we could hope for. Too many stimuli would bore, overwhelm, or make it impossible to understand. Distilling the performance of James Brown into an album made it possible for the performance to reach millions of people. And it makes it possible for us to keep traveling back to the 1962 Apollo.

  To see why encoding is necessary, imagine trying to memorize an event without any simplification taking place; the result might be called a “total rote recording” or “perception without concepts” … In the real world we can't possibly take everything into account all the way down to its most microscopic details, and so we necessarily must ignore almost everything about every situation that we encounter, and that means we unconsciously make a highly selective encoding of it when we store it in memory. We have to strip everything we experience down to a caricature of itself.

  HOFSTADTER AND SANDER, 2013

  Go and get more information.

  BOOK OF SAMUEL, 1:23

  All data is a shadow of what has flowed before. Data is reality distilled with intention. We no longer have to picture data as an impenetrable monolith. When we think about data, we should consider the world that delivered it to us. Pause to reflect: What has been lost from the data's world? Why were some things selected to survive? How has it all been transmitted forward to us, today? Then, we can see data for what it is, whispers from a past world waiting for its music to be heard again.

  CHAPTER

  2

  INFORMATION MURMURS

  “Data! data! data!” he cried impatiently. “I can't make bricks without clay.”

  SHERLOCK HOLMES, 1892

  THE ADVENTURE OF THE COPPER BEECHES

  We should not take data whispers for granted. The past has not been delivering them for very long. We did not always have data, well, at least not how we do today. The impressions and compressions of life that data preserves makes the very idea of data beautiful. How we got data is quite a story.

  Wheat centralizes power. Visible above ground and harvested all at once, you cannot evade the taxman if the entire village knows who grew what.

  A common telling of our history goes like this: Civilizations flourished across the globe with the ability to harvest and store food. Food surpluses went into centralized granaries. This accumulated wealth spurred development. Bureaucratic management, trade, and a new hierarchy emerged. Increasingly, social power wielded physical power. Caches of food, and the material luxuries they helped achieve, became targets for competing communities. Violence evolved right along with the emergence of civilization. Together, they advanced in organization and efficiency.

&n
bsp; These plants domesticated Homo sapiens, rather than vice versa. … The word “domesticate” comes from the Latin domus, which means “house.” Who's the one living in a house? Not the wheat. It's the Sapiens.

  YUVAL NOAH HARARI, 2015

  That caricature is one way of telling the story. Today, new discoveries are challenging long-held beliefs about life before the Fertile Crescent. Pre-agriculture humanity seems to have actually been more capable and vibrant than ever imagined. One of the reasons we credit the relationship between cultivation and civilization so much, you see, is because of a massive information-survival bias. The agricultural revolution did not just produce a lot of wheat. It also produced an enormous amount of durable information objects. The agricultural revolution birthed a wealth of data. Food, livestock, and people were counted. These numbers were first represented with small sculptures, which were then abstracted into a system of 3-D tokens. Tiny cone and ball tokens were later pressed into clay to create records. Triangular and circular indentations eventually became symbolic. The tokens were abandoned in favor of the more efficient reproduction of their indentation with a reed stylus. Writing was born.

 

‹ Prev