The Long Tail Page 8 Read online free by Chris Anderson

The Long Tail Page 8

The answer is not a simple yes or no, because it is the nature of user-created content to be as messy and uncertain at the microscale, which is the level at which we usually experience it, as it is amazingly successful at the big-picture macroscale. It just has to be understood for what it is.

Wikipedia, like Google and the collective wisdom of millions of blogs, operates on the alien logic of probabilistic statistics—a matter of likelihood rather than certainty. But our brains aren’t wired to think in terms of statistics and probability. We want to know whether an encyclopedia entry is right or wrong. We want to know that there’s a wise hand (ideally human) guiding Google’s results. We want to trust what we read.

When professionals—editors, academics, journalists—are running the show, we at least know that it’s someone’s job to look out for such things as accuracy. But now we’re depending more and more on systems where nobody’s in charge; the intelligence is simply “emergent,” which is to say that it appears to arise spontaneously from the number-crunching. These probabilistic systems aren’t perfect, but they are statistically optimized to excel over time and large numbers. They’re designed to “scale,” or improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.

But how can that be right when it feels so wrong?

There’s the rub. This tradeoff is just hard for people to wrap their heads around. There’s a reason why we’re still debating Darwin. And why The Wisdom of Crowds, James Surowiecki’s book on Adam Smith’s invisible hand and how the many can be smarter than the few, is still surprising (and still needs to be read) more than two hundred years after the great Scotsman’s death. Both market economics and evolution are probabilistic systems, which are simply counterintuitive to our mammalian brains. The fact that a few smart humans figured this out and used that insight to build the foundations of our modern economy, from the stock market to Google, is just evidence that our mental software (our collective knowledge) has evolved faster than our hardware (our neural wiring).

Probability-based systems are, to use writer Kevin Kelly’s term, “out of control.” His seminal book by that name looks at example after example, from democracy to bird-flocking, where order arises from what appears to be chaos, seemingly reversing entropy’s arrow. The book is more than a dozen years old, and decades from now we’ll still find the insight surprising. But it’s right.

Is Wikipedia “authoritative”? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it’s not infallible either; indeed a 2005 study by Nature, the scientific journal, reported that in forty-two entries on science topics there were an average of four errors per entry in Wikipedia and three in Britannica. And shortly after the report came out, the Wikipedia entries were corrected, while Britannica had to wait for its next reprinting.

Britannica’s biggest errors are of omission, not commission. It is shallow in some categories and out of date in many others. And then there are the millions of entries that it simply doesn’t—and can’t, given its editorial process—have. But Wikipedia can scale itself to include those and many more. And it is updated constantly.

The advantage of probabilistic systems is that they benefit from the wisdom of the crowd and as a result can scale nicely both in breadth and depth. But because they do this by sacrificing absolute certainty on the microscale, you need to take any single result with a grain of salt. Wikipedia should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts.

The same is true for blogs, no single one of which is authoritative. Blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail—it is, by definition, variable and diverse. But collectively blogs are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.

Likewise for Google, which seems both omniscient and inscrutable. It makes connections that you or I might not, because they emerge naturally from math on a scale we can’t comprehend. Google is arguably the first company to be born with the alien intelligence of the Web’s “massive-scale” statistics hardwired into its DNA. That’s why it’s so successful, and so seemingly unstoppable.

Author Paul Graham puts it like this:

The Web naturally has a certain grain, and Google is aligned with it. That’s why their success seems so effortless. They’re sailing with the wind, instead of sitting becalmed praying for a business model, like the print media, or trying to tack upwind by suing their customers, like Microsoft and the record labels. Google doesn’t try to force things to happen their way. They try to figure out what’s going to happen, and arrange to be standing there when it does.

The Web is the ultimate marketplace of ideas, governed by the laws of big numbers. That grain Graham sees is the weave of statistical mechanics, the only logic that such really large systems understand. Perhaps someday we will, too.

THE POWER OF PEER PRODUCTION

As a whole, Wikipedia is arguably the best encyclopedia in the world: bigger, more up-to-date, and in many cases deeper than even Britannica. But at the individual entry level, the quality varies. Along with articles of breathtaking scholarship and erudition, there are plenty of “stubs” (placeholder entries) and even autogenerated spam.

In the popular entries with many eyes watching, Wikipedia shows a remarkable resistance to vandalism and ideological battles. One study by IBM found that the mean repair time for damage in high-profile Wikipedia entries such as “Islam” is less than four minutes. This is not the work of the professional encyclopedia police. It is simply the emergent behavior of a Pro-Am swarm of self-appointed curators. Against all expectations, the system works brilliantly well. And as Wikipedia grows, this rapid self-repairing property will spread to more entries.

The point is not that every Wikipedia entry is probabilistic, but that the entire encyclopedia behaves probabilistically. Your odds of getting a substantive, up-to-date, and accurate entry for any given subject are excellent on Wikipedia, even if every individual entry isn’t excellent.

To put it another way, the quality range in Britannica goes from, say, 5 to 9, with an average of 7. Wikipedia goes from 0 to 10, with an average of, say, 5. But given that Wikipedia has twenty times as many entries as Britannica, your chances of finding a reasonable entry on the topic you’re looking for are actually higher on Wikipedia.

What makes Wikipedia really extraordinary is that it improves over time, organically healing itself as if its huge and growing army of tenders were an immune system, ever vigilant and quick to respond to anything that threatens the organism. And like a biological system, it evolves, selecting for traits that help it stay one step ahead of the predators and pathogens in its ecosystem.

The traditional process of creating an encyclopedia—professional editors, academic writers, and peer review—aims for perfection. It seldom gets there, but the pursuit of accuracy and clarity results in a work that is consistent and reliable, but also incredibly time-consuming and expensive to produce. Likewise for most other products of the professional publishing industry: One can expect that a book will, in fact, have printing on both sides of the pages where intended and will be more or less spelled correctly. There is a quality threshold, below which the work does not fall.

With probabilistic systems, though, there is only a statistical level of quality, which is to say: Some things will be great, some things will be mediocre, and some things will be absolutely crappy. That’s just the nature of the beast. The mistake of many of the critics is to expect otherwise. Wikipedia is simply a different animal from Britannica. It’s a living community rather than a static reference work.

The true miracle of Wikipedia is that this open system of amateur user contributions and edits does
n’t simply collapse into anarchy. Instead, it has somehow self-organized the most comprehensive encyclopedia in history. Reversing entropy’s arrow, Jimmy Wales’s catalytic moment—putting up a few initial entries and a mechanism for others to add to them—has actually created order from chaos.

The result is a very different kind of encyclopedia, one completely unbounded by space and production constraints. It offers all the expected entries of any world-class reference work and then hundreds of thousands of unexpected ones, ranging from articles that go into textbook-like depth in fields such as quantum mechanics to biographical entries on comic book characters. Or, to put it another way, it’s got all the hits plus a huge number of niches.

The classic model of the encyclopedia is a curated list of received cultural literacy. There is the basic canon, which must be recognized by authorities. Then, there are other entries of diminishing length until you get to that line at which the priests of Britannica decide “This is not worthy.” There, the classic encyclopedia ends. Wikipedia, on the other hand, just keeps going.

In a sense, you can think of Wikipedia as equivalent to Rhapsody, the music site. There are the popular top 1,000, which can be found in any encyclopedia: Julius Caesar, World War II, Statistics, etc. These are like the hit songs. With these, Wikipedia is competing with professionals at their best, who produce well-written, authoritative entries that deploy facts with the easy comfort that comes with great scholarship. The main advantage of the user-created Wikipedia model for these entries is its ability to be up-to-date, have unlimited length and visual aids (such as photos and charts), include copious links to support material elsewhere, and perhaps, better represent alternate views and controversies.

In the middle of the curve, from the 1,000th entry to where Britannica ends at 120,000, are the narrower subjects: Caesarian Section, Okinawa, Regression Analysis, etc. Here, the Wikipedia model begins to pull ahead of its professional competition. Unlimited space means that the Wikipedia entries tend to be longer and more comprehensive. While the average length of a Britannica entry was 678 words in 2006, more than 200,000 Wikipedia entries (more than two entire Britannicas) were longer than that. Meanwhile, the external links and updated information emerge as a key advantage as Wikipedia becomes a launching place for further research.

Then there is the Tail, from 120,000 to 1 million. These are the entries that Wikipedia has that no other encyclopedia even attempts to include. Its articles on these subjects—Caesar Cipher, Canned Spam, Spearman’s Rank Correlation Coefficient—range from among the best in Wikipedia (those written by passionate experts) to the worst (self-promotion, score-settling, and pranks). While many critics focus on the worst entries, the really important thing about Wikipedia’s Tail is that there is nothing else like it anywhere. From hard-core science to up-to-the-minute politics, Wikipedia goes where no other encyclopedia—whether constrained by paper or DVD limitations—can. Britannica doesn’t have an entry about the Long Tail phenomenon (yet), but Wikipedia’s entry is not only well written and thorough, it’s also 1,500 words long (and none of it was written by me!).

Wikipedia authors tend to be enthusiastically involved, liberated, and motivated by the opportunity to improve public understanding of some subject they know and love, a population that has, in five short years, grown a thousandfold with an invasion of empowered amateurs using the simple, newly democratized tools of encyclopedia production: a Web browser and an Internet connection.

This is the world of “peer production,” the extraordinary Internet-enabled phenomenon of mass volunteerism and amateurism. We are at the dawn of an age where most producers in any domain are unpaid, and the main difference between them and their professional counterparts is simply the (shrinking) gap in the resources available to them to extend the ambition of their work. When the tools of production are available to everyone, everyone becomes a producer.

THE REPUTATION ECONOMY

Why do they do it? Why does anyone create something of value (from an encyclopedia entry to an astronomical observation) without a business plan or even the prospect of a paycheck? The question is a key one to understanding the Long Tail, partly because so much of what populates the curve does not start with commercial aim. More important, this question matters because it represents yet another example of where our presumptions about markets must be rethought. The motives to create are not the same in the head as they are in the tail. One economic model doesn’t fit all. You can think of the Long Tail starting as a traditional monetary economy at the head and ending in a non-monetary economy in the tail. In between the two, it’s a mixture of both.

Up at the head, where products benefit from the powerful, but expensive, channels of mass-market distribution, business considerations rule. It’s the domain of professionals, and as much as they might love what they do, it’s a job, too. The costs of production and distribution are too high to let economics take a backseat to creativity. Money drives the process.

Down in the tail, where distribution and production costs are low (thanks to the democratizing power of digital technologies), business considerations are often secondary. Instead, people create for a variety of other reasons—expression, fun, experimentation, and so on. The reason one might call it an economy at all is that there is a coin of the realm that can be every bit as motivating as money: reputation. Measured by the amount of attention a product attracts, reputation can be converted into other things of value: jobs, tenure, audiences, and lucrative offers of all sorts.

Tim Wu, a Columbia University law professor, calls this the “exposure culture.” Using blogs as an example, he writes,

The exposure culture reflects the philosophy of the Web, in which getting noticed is everything. Web authors link to each other, quote liberally, and sometimes annotate entire articles. E-mailing links to favorite articles and jokes has become as much a part of American work culture as the water cooler. The big sin in exposure culture is not copying, but instead, failure to properly attribute authorship. And at the center of this exposure culture is the almighty search engine. If your site is easy to find on Google, you don’t sue—you celebrate.

Once you think of the curve as being populated with creators who have different incentives, it’s easy to extend that to their intellectual property interests as well. Disney and Metallica may be doing all they can to embrace and extend copyright, but there are plenty of other (maybe even more) artists and producers who see free peer-to-peer (“P2P”) distribution as low-cost marketing. Musicians can turn that into an audience for their live shows, indie filmmakers treat it as a viral resume, and academics treat free downloads of their papers as a way to increase their impact and audience.

Each of these perspectives changes how the creators feel about copyright. At the top of the curve, the studios, major labels, and publishers defend their copyright fiercely. In the middle, the domain of independent labels and academic presses, it’s a gray area. Farther down the tail, more firmly in the noncommercial zone, an increasing number of content creators are choosing explicitly to give up some of their copyright protections. Since 2002, a nonprofit organization called Creative Commons has been issuing licenses of the same name to allow for a flexible use of certain copyrighted works for the sake of the greater value (for the content creators) of free distribution, remixing, and other peer-to-peer propagation of their ideas, interests, and fame. (Indeed, I’ve done that with my own blog, for all of the reasons above.)

In short, some creators care about copyright and some don’t. Yet the law doesn’t distinguish between them—copyright is automatically granted and protected unless explicitly waived. As a result, the power of “free” is obscured by fears over piracy and is often viewed with suspicion, not least because it evokes unfortunate echoes of both communism and hippie sloganeering.

Regardless, it’s something we’re starting to reconsider as the power of the “gift economy” becomes clear—in everything from the blogosphere to open source. In one part of my profession
al life (the 650,000-circulation magazine I edit), I’m near the head of the curve, and in another (my 30,000-reader blog) I’m in the tail. My decisions on intellectual property are different in each. Someday soon, I hope, marketplace and regulation will more accurately reflect this reality.

SELF-PUBLISHING WITHOUT SHAME

We think of books through a commercial lens, assuming that most authors want to write a best-seller and get rich. But the reality is that the vast majority of authors not only won’t become best-sellers, but also aren’t even trying to write a hugely popular book. Each year, nearly 200,000 books are published in English. Fewer than 20,000 will make it into the average book superstore. Most won’t sell.

In 2004, 950,000 books out of the 1.2 million tracked by Nielsen BookScan sold fewer than ninety-nine copies. Another 200,000 sold fewer than 1,000 copies. Only 25,000 sold more than 5,000 copies. The average book in America sells about 500 copies. In other words, about 98 percent of books are noncommercial, whether they were intended that way or not.

The quest for mass-market acceptance requires compromise—a willingness to pick topics of broad rather than narrow interest, and to write in conversational rather than academic style. Most writers can’t do that and many others won’t. Instead, the vast majority of authors choose to follow their passions and assume they won’t make money. Many want no more than to be read by some group that matters to them—from their peers to like-minded souls.

Such profitless publishing can be lucrative all the same. The book becomes not the product of value but the advertisement for the product of value—the authors themselves. Many such noncommercial books are best seen as marketing vehicles meant to enhance the academic reputation of their authors, market their consultancy, earn them speaking fees, or just leave their mark on the world. Seen that way, self-publishing is not a way to make money; it’s a way to distribute your message.

‹ Prev Next ›