Book Read Free

The Half-Life of Facts

Page 8

by Samuel Arbesman


  This occurs most clearly in science itself.

  • • •

  WHILE it may be an extreme case, Tai’s mistake is far from the exception in the world of science. When it comes to science, too often knowledge simply doesn’t spread as quickly, or as evenly, as we might expect. Disciplines grow rapidly and ramify; it becomes difficult for any one person to know all that has been discovered in a single area.

  There has been rapid growth in interdisciplinary research in the past few decades. Molecular biologists work with applied mathematicians, sociologists work with physicists, economists even work with geneticists. If you can think of two fields, you can think of a way to combine a prefix from one and a suffix from the other in order to get a new discipline. People often flippantly acknowledge that an area is about to undergo a shift if the physicists begin to move into it. They have already begun a steady colonization of biology, economics, and sociology, as evidenced by such newfangled terms as biophysics, econophysics, and sociophysics.

  This trend is a welcome one, on the whole, because it often leads to ideas that are well-known in one field finding wonderful applications in another area, where they have not yet been considered. This can lead to an exciting synthesis of ideas, yielding something new and vibrant. But when multiple areas are linked together only superficially, and knowledge is not truly combined, it occasionally leads to a situation in which someone thinks they’ve discovered something new, yet they’re only re-creating something that has been known for a long time in another field. Tai’s experience is an extreme example of this, but smaller examples abound.

  My own research, which draws from many different disciplines, has not been immune to this problem. In the fall of 2010, when I was a postdoctoral research fellow at Harvard, I was working with Jukka-Pekka Onnela, a fellow postdoc and currently a professor at Harvard’s School of Public Health, on a project involving a large anonymized data set of cell phone network calls from a country in Europe. In addition to knowing who called whom, which was important for understanding social ties, we also had information about the callers’ locations down to the level of the cell tower. Using our data we could map a community of callers on a country map.

  When you do this sort of mapping you get a nice scatter of points on a grid. As part of our work, we wanted to know whether there were clusters of points on this grid, and if so, how many groups of people were there in the points we were examining. I already knew of many sophisticated ways to cluster data points based on their locations, but you often need to know the number of clusters in advance. For example, if you know that there are three groups of data, these algorithms will take your data and place them into three different groups. But what if you didn’t know the number?

  Onnela and I began searching online for help with this problem, and while we found a lot about clustering, we didn’t find the answer to our problem. At one point in this long process one of us might even have recommended thinking about how to create our own method. Then I came up with a solution: “Why don’t we just go down the hall and speak to Alan?” Alan is Alan Zaslavsky, a statistician in the Department of Health Care Policy at Harvard, and someone you can count on to be knowledgeable about all things relating to statistics and mathematics (and most other subjects as well). So we walked down the hall and knocked on his door. He wasn’t busy, so we went in for a chat, and within five minutes we had the answer: something called the Akaike Information Criterion, created decades ago, was exactly what we needed.

  But if we hadn’t done this we possibly would have recapitulated some top-notch work done years ago, and likely would have done it far worse than Akaike himself. As our general body of knowledge increases in general, this problem of how to disseminate facts effectively will only become more acute.

  Of course, these problems are not even close to new. Even prior to our days of information overload we were plagued with instances of knowledge spreading slowly, or simply failing to spread. The Battle of New Orleans only occurred because news of the end of the War of 1812 traveled so slowly in the early nineteenth century: The war had been over for two weeks when this battle was fought between British and American soldiers. And the Pony Express even promoted its speed by advertising that it delivered news of Lincoln’s election in 1860 to the West Coast in only a little more than seven days.

  But why does knowledge spread unevenly? Certainly geography is one component: People within reach of the telegraph knew about Lincoln’s election much more quickly than those in California, and the closer you were to Washington, D.C., the sooner you knew about the end of the War of 1812. But does that fully explain it, or are there other factors?

  One well-studied case of the spread of knowledge, which itself catalyzed how knowledge spread in general, is also one of the most profound innovations of the past thousand years: the printing press.

  • • •

  THE printing press is one of the foundational technologies of our civilization. Unlike the waterwheel, for example, the spread of which is no doubt interesting as well, the printing press has acted as a catalyst for the further dissemination of knowledge. Much like how the production of an enzyme speeds up other chemical reactions, the printing press facilitated the spread and development of new facts. In its first fifty years of existence alone, the price of books in Europe fell by two thirds, a staggering drop in the cost of any item. But did the printing press itself spread rapidly? And were there any patterns in its spread?

  The printing press was invented by Johannes Gutenberg in Mainz around 1440. Despite its groundbreaking nature, and as important as it has been since then, it turns out that it did not immediately spread throughout the world. It did not even spread immediately throughout Europe. Rather, it took decades to diffuse, seeping slowly into the cities surrounding Mainz.

  Much of the initial spread occurred throughout Germany and northern Italy, and it wasn’t until 1476 that the first printing press was found anywhere in England. But it doesn’t take decades for information to spread from Germany to England. Even the Black Death, which people tried actively to stop from spreading, made its way across the European continent and the English Channel far faster, in less than half a decade. Clearly geography is not the only part of the story. So why the delay?

  Jeremiah Dittmar, an economics professor at American University, has examined this spread carefully. He looked at how certain cities were affected,3 and why it went from one place to the next. On the following page are a few maps of cities that demonstrate the spread of the printing press over time.

  Why wasn’t the spread just based on distance? As people got the news of this technology, couldn’t they easily begin implementing it themselves?

  It turns out that the printing press is far from simple. The technological innovations that Gutenberg developed were much more than the modification of a wine press and the addition of the idea of movable type. Gutenberg combined and extended a whole host of technologies4 and innovations from an astonishing number of areas, and that is what made his work so powerful. He used metallurgical developments to create metal type that not only had a consistent look (Gutenberg insisted on this), but type that could be easily cast, allowing whole pages to be printed simply at once. He used chemical innovations to create a better ink than had ever been used before in printing. Gutenberg even exploited the concept of the division of labor by employing a large team of workers, many of whom were illiterate, to churn out books at a rate never before seen in history. And he even employed elegant error-checking mechanisms to ensure that the type was always set properly: There was a straight line on one side of each piece of type so that the workers could see at a glance whether any letters had been set upside down.

  Figure 6. Diffusion of the movable-type printing press over time. From Dittmar. “Information Technology and Economic Change: The Impact of The Printing Press.” The Quarterly Journal of Economics 126, no. 3 (August 1, 2011): 1133
–72, by permission of Oxford University Press.

  Only by having the combined knowledge of all of these technologies does the printing press become possible and cost-effective. So while it’s true that geography—specifically, the distance from Gutenberg’s city of Mainz—can explain a great deal of the delay for when certain cities adopted the printing press technology, that’s far from the entire story.

  The important factor was something more subtle: personal contacts. The cities that got the press first got it because Germans lived there, specifically Germans who had the necessary skills and technologies to make a printing press. These personal contacts allowed for the spread of this semiproprietary technology. If one is from a culture in common with someone else, with common language and traditions, you’re more likely to trust each other. This is exactly what happened here. Just as Jews and Huguenots built widespread trading and financial networks, the Germans used their own social ties that had been built on trust and apprenticeship to spread the technologies of the printing press.

  This is the rule when it comes to how facts spread: social networks spread information. Of course, back in the day of the printing press, geography and social connectivity were harder to disentangle. As mentioned earlier, just a century prior to Gutenberg, when the Black Death swept across Europe, it spread at the same speed as the rate of movement in that century. But social ties are also vital to the spread of knowledge.

  This can be seen by looking at the way the population sizes of cities affected the spread of the printing press. When Dittmar examined city size, he found that larger cities were much more likely to adopt the movable-type printing press technologies soon after their invention, compared to small cities. While only a third of cities in Europe were early adopters, these cities held more than half the population of Europe. Which shouldn’t be surprising. Larger cities have more people, yielding more opportunities for there to be a social tie from one city to another. Just as we can look at the German ties, we can see how larger populations mean more ties. Ultimately, when trying to understand the how facts spread, it comes down to social networks.

  It’s one thing to know that social ties lead to the spread of facts. That is almost intuitively obvious. But are there regularities to how this happens? Can we quantify how facts spread from person to person? Happily, there is an entire field devoted to understanding such networks, which is known as network science. Network science examines how connections operate, whether they are connections between people or computers, or even interacting proteins. And just as the mathematics of network science doesn’t care what is connected, it is also agnostic about what spreads across these networks. Whether the network is spreading innovations, pieces of news, germs, or pretty much anything else, network science can provide insight.

  So it shouldn’t be surprising that network science has a great deal to say about the ways in which information and facts can spread, like diseases, from one person to another.

  • • •

  WE are all embedded within social networks. We have friends, neighbors, and relatives. They in turn have contacts of their own. Do this a few more times, and you’ve reached nearly every person on the planet. That, simply put, is the concept of six degrees of separation.

  But knowing the social distance from one individual to another is far from the complete picture. Over the past several decades, network science has developed a far more detailed, though still incomplete, picture of our social interactions. We now understand mathematically why the most popular person in a network has so many more friends than the next most popular person, and we have measured the average number of close social connections5 each person maintains on a regular basis (it’s about four). We understand how social groups are distributed6 across countries, and even how we make and break friendships over time. In these ways, and many more, we are beginning to truly understand the social structures that we are embedded in, and how these ties influence us.

  Some of the most cutting-edge research that is going on right now is devoted to understanding how our connections influence us, and how things spread. As a postdoc I worked in the laboratory of Nicholas Christakis, one of the giants in this field. If you don’t recognize his name, you may remember some of the New York Times headlines about work done by him and his longtime collaborator James Fowler:7 “Are Your Friends Making You Fat?”; “Find Yourself Packing It On? Blame Friends”; “Study Finds Big Social Factor in Quitting Smoking”; “Strangers May Cheer You Up, Study Says.”

  What these researchers have found, in study after study, is that our actions have consequences that ripple across our social web to our friends, our friends’ friends, and even our friends’ friends’ friends.

  But just as health behaviors spread, so do facts and bits of knowledge. Since information spreads through social rather than physical space, it is vital that we understand social networks and how they operate. In this globalized age, where we can be anywhere on the planet within a day or so, the ties we have to those we know, rather than where we are, take on greater meaning. Whether we are advertisers trying to gain an advantage in the marketplace, or even just want to lose a few pounds, we crave the answers to a whole host of new kinds of questions about networks. A sample of such questions:

  How does each sort of tie that we have to those around us—whether friend, relative, spouse, neighbor—affect the spread of each individual fact or even each behavior? Are our social ties related to distance, which could have an effect on how information spreads? What do the structures of people’s social networks look like, and do the shapes of these networks—regular, random, or something in-between—affect how we interact? And are our social ties, such as how many friends we have, and even how likely our friends are to know one another, affected by the genes inside us?

  All of these questions are beginning to be asked, and answered, by network scientists. In our specific concern, network scientists have recently begun to explore certain cases where facts spread, or don’t spread, and how this works.

  • • •

  BACK in the 1970s a sociologist named Mark Granovetter created a simple little thought experiment: He imagined each social connection between people as having one of two strengths—weak or strong. Strong ties are those that we have to our parents, our spouses, or our close friends. Weak ties are those that we have with friends from high school or college to whom we seldom speak. Or to the acquaintance at work whom we banter with but don’t generally speak to outside the office. Or, in the modern age, most of our Facebook “friends.”

  Granovetter’s thought experiment: If we have only these two connection strengths, simplistic though that may be, what should our social networks look like? He argued that if two of our friends are close to us, it is very likely that they will know each other, and probably be close to each other as well. Therefore, much of a social network should consist of clusters of tightly knit groups that are connected by their strong ties into little triangles. But these tight-knit groups are occasionally connected to other strong clusters by weak ties. If these weak ties are the only ties that act as bridges between these little clusters, these weak ties should therefore be very important for facilitating the spread of information far and wide through the network, from one cluster to another. What Granovetter argued for, in other words (and in the words of the title of his celebrated paper), was “The Strength of Weak Ties.”

  Granovetter even backed this up with some simple data: He surveyed a group of people on how they got their jobs. Of those who said they got a job through personal contacts, he found that most of these personal contacts were quite “weak.”

  More recently, scientists have been able to test whether Granovetter was right. Jukka-Pekka Onnela, my former coworker who I mentioned earlier, was actually involved in one of the foundational papers in this area. To understand how information spreads he used a data set that is unbelievably rich and has been the basis for many scienti
fic papers: a collection of anonymized mobile phone calls in a country in Europe.

  Using the data about who calls whom, Onnela and his colleagues were able to construct a social network that spans an entire country. But not only did they have the ties between people, they had the strength of each tie: how many minutes people spoke to one another over the course of several months.

  They were able to conduct a test using this network: They created an abstract contagion in a computer-based simulation—it could be a disease, a bit of gossip, a fact, or anything else—and had it spread in the cell phone network according to one basic assumption: The stronger the tie between two people, the more likely the contagion would spread from one person to another.

  This is entirely reasonable. If you spend more time with someone who has a cold, you’re more likely to get sick. The more often you speak with someone, the more likely they are to tell you a bit of juicy gossip.

  For each of one thousand simulations, the team would begin by randomly choosing a few people to start the contagion. Then, at each step, a weighted coin would be flipped for each neighboring person who could possibly become infected. The stronger the tie, the more weighted the coin would be toward infection. Through running the simulations they were able to see how long it took for everyone to become infected, as well as what happened along the way.

  When they tested the network and ran this experiment, they discovered that weak ties aren’t that important to spreading knowledge. While weak ties do in fact hold the network together, much as Granovetter suspected, they aren’t integral for spreading facts. Weak ties, while bringing together disparate social groups, aren’t strong enough to spread anything effectively.

  But strong ties also aren’t that important. While they can spread a fact with ease, most of the time they are spreading it to people who already know it, because strong ties only exist in highly clustered groups of people who often all know similar things.

 

‹ Prev