That indifference registered again with Google Books. Beginning in 2002, Google scanned around twenty million books via university libraries and their own high-speed scanning and page-turning machines. The pre-copyright books were not a problem, and they are still accessible on Google Books today. Google had hoped to strike a deal with publishers to make itself the broker for sales of existing online books that were not free of copyright, but this was not to be. The Authors Guild filed suit against Google in 2005, an ambitious settlement was rejected in 2011, and the case was finally decided in Google’s favor in 2016, leaving Google free to use the books it scanned in its search results but not to sell them. Google does sell some ebooks as of this writing, but only a tiny percentage of the copyrighted works it has scanned.*6 In the absence of a significant financial incentive, Google Books remains a treasure trove, a vast compendium of works written before 1923, most obscure and forgotten: Victorian potboilers, antiquated reference works, chronicles of trends long past. It is also, however, a flawed archive. The character recognition is unreliable, the text formatting of the resulting e-books is irregular and sometimes unreadable (especially in the case of non-Latin characters), and I’ve sometimes stumbled on a scanned page that is folded in half or otherwise obscured, when the page-turning machine fell prey to wind or some other mechanical irregularity.
Google Books was, ultimately, less an organized archive than a heap of stuff. A great deal of the world’s information had been gathered in Google Books, but it was irregular, flawed, and incomplete. To Google it was a dataset, no more or less perfect than any other. Perhaps if some careful algorithmic analyses had been run over the books, Google’s machines could have found irregularities that could be conclusively identified as errors. But there was little opportunity for profit. Those mistakes were left to humans to discover.
Google nonetheless created the largest online library, even if it was dotted with random flaws and serendipitous accidents. Jorge Luis Borges’s classic story “The Library of Babel” describes a near-infinite library in which every possible text exists, by virtue of there being a book possessing every possible arrangement of the twenty-two orthographic symbols of its alphabet.
All that it is given to express, in all languages. Everything: the minutely detailed history of the future, the archangels’ autobiographies, the faithful catalogue of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages, the interpolations of every book in all books.
It is impossible to find anything. No particular text can be located among the vast volumes. Finding the right book in the right language is a fool’s errand when it is surrounded by millions of wrong books in unknown languages and no index exists. (Borges points out that such an index does in fact exist…in some other book in the library, never to be located.) Computers can generate such libraries, as pointless as they would be.
Google Books scanned and mis-scanned from the library of human meaning, and it did so imperfectly. What it obtained, instead of a Library of Babel, was an accidental assemblage: a partial, somewhat arbitrarily selected production of works.
In another Borges story, “The Lottery in Babylon,” a society turns over the fate of its individuals and community to a shadowy Company that manages their fortunes through sacred, secret lotteries. The Company becomes more absolute and infects every aspect of life, until it fades into the chance operations of nature and life themselves, such that the very idea of the Company becomes tantamount to religion.
[The Company’s] silent functioning, comparable to God’s, gives rise to all sorts of conjectures. One abominably insinuates that the Company has not existed for centuries and that the sacred disorder of our lives is purely hereditary, traditional. Another judges it eternal and teaches that it will last until the last night, when the last god annihilates the world. Another declares that the Company is omnipotent, but that it only has influence in tiny things: in a bird’s call, in the shadings of rust and of dust, in the half dreams of dawn.
What Google possesses is not a Library of Babel but a Library of Babylon. Its mistakes in character recognition, page-turning, and cataloguing are random and unguided by human hands, subject to the caprices of physics and chaos, like the weather or the human body. Many predict the miracles of artificial intelligence, from perfect robot companionship to perfect economic management, but our ever-enlarging corporate networks are governed less by rationality than by a slapdash assemblage of heuristics and approximations that slouch toward the chance operations of the Lottery in Babylon. Such heaps of code work best when 75 to 80 percent success is good enough, as with search results or content filtering. When every mistake stands out, as with voice recognition or translation, these titanic, labyrinthine systems look far less impressive.*7 Google is a dumb god.
And often, a broken god. We are witness to the small failures of tech companies on a daily basis, whether it’s a crashing phone or a mistargeted advertisement. Some failures are more malignant: Google “disappears” a page from its index, suggests a doctor’s confidential patients as friends. A person is often left wondering why the failure happened, with no answers forthcoming. There is an old saying known as Hanlon’s razor: never attribute to malice what can adequately be explained by stupidity. Goethe phrased the idea similarly in The Sorrows of Young Werther: “Misunderstandings and indolence cause more mishaps in this world than cunning and malice do.” The world of software is full of misunderstanding and indolence both. But I offer a variation for the world of data: “Never attribute to programmer intent what can adequately be explained by incomprehensible complexity.” We shouldn’t overestimate the degree of control programmers have over the algorithms they design. Bizarre or offensive behavior is far more likely to be an accident than it is a consequence of the actual design of the algorithm.*8
I’m asked sometimes why engineers at Google and Facebook are so arrogant to think that they can look at millions of people’s data and not feel they are violating their privacy. Normally, data analysis is run by computer, but engineers do sometimes peek when testing, developing, and debugging. I don’t think it is arrogance exactly; rather, engineers function at a priestly remove from the world. At companies like Google or Facebook, programmers engage with people’s personal information in such a way that they are indifferent to its implications. When a scandal erupts like that around political marketers Cambridge Analytica and Facebook, in which mass outrage greeted the revelation that Cambridge Analytica had obtained personal data of tens of millions of Facebook users, the true revelation is that Cambridge Analytica is just one of thousands of companies that partake in Facebook’s data—albeit one of the shadier ones. Such sporadic outrages belie the irreversible ease with which consumer profiling and analysis has permeated every aspect of our lives, creating a world of information that literally did not exist fifty years ago. Here, the issue is not algorithms but data collection itself. Data does not come with a “Use Only for Good” sign attached to it.
* * *
—
There is a paradox in the public debate around algorithms. Half of our cultural critics are saying, “Computers are inherently biased; that’s why we need humans!” The other half are saying, “Humans are inherently biased; that’s why we need computers!”*9 I have even seen a single critic make these two statements in a single essay. In May 2016, facing anonymous allegations that it was biased against conservative media sources, Facebook released the internal guidelines it provided to contractors whose job it was to evaluate and summarize world events for their Trending Topics feature. The guidelines, it turned out, were quite vanilla, with the work being more to summarize and categorize news stories than to filter them. Faced with a
barrage of ill-informed negative coverage from liberal and conservative sources, Facebook laid off the employees who had been classifying stories and committed to making the feed more “algorithmic”—only for a new round of critics to say that in the absence of humans enforcing neutrality, the news algorithms were inevitably skewed by governmental and media manipulation. Facebook couldn’t win. The best solution was to do nothing, or at least appear to do nothing. Facebook pulled back and made as little comment as possible. But software companies do not have the luxury (yet) of fully fading into the background like the Lottery of Babylon. They will continue to attract ire for their increasingly powerful interventions, even if none of us know what is actually going on behind their doors.
Descent from the Sky
This fierce abridgement
Hath to it circumstantial branches which
Distinction should be rich in.
—WILLIAM SHAKESPEARE, Cymbeline
In my five years at Google, I worked at increasing levels of abstraction, as many engineers tend to do. Software engineers aim to automate algorithmic processes that will run countless times on standardized sets of data that vary within known, specified parameters. For Google, such problems included crawling web pages, providing the best available results for search queries, displaying maps of user-specified locations, and running an email service for hundreds of millions of users. None of these tasks are permanently soluble. There are changing specifications, new requirements, new features to implement, or an ongoing need to improve the quality of the product.
A research-oriented computer scientist focuses primarily on the cutting edge, trying to find new methods at the very limits of what algorithms can do. Some areas of computer science, such as those dealing with compilers and operating systems, are considered more or less “solved problems” at this point. It’s not that improvements are no longer possible, rather that the space of the underlying problem has been thoroughly explored. Compared to an area like computer vision or quantum computing (currently more dream than reality), there are many mature areas in computer science in which cutting-edge innovation is rare. These fields are, not coincidentally, those in which software engineering makes its trade. They were the fields that made large-scale software engineering possible. Microsoft, Apple, Amazon, and Google rely on solid and hyperefficient operating systems, compilers, and networking infrastructure. Artificial intelligence would be useful as well, but it is a far more daunting problem. The shape of software engineering formed itself around what was most easily achievable. “Low-hanging fruit” is what engineers call pretty much everything they accomplish rather than put off indefinitely. Google avoids trying to grasp the meaning of the web pages it retrieves. Apple revived its brand with new hardware rather than innovative software. Microsoft succeeded by slightly outperforming competitors.
A software engineer chooses which problems to take on. By the time I left Google, I had programmed for over twenty years, twelve of them professionally. The methodology of software engineering had become very familiar to me. That’s not to say I obtained a serious familiarity with the entirety of Google nor with the many different types of software that Microsoft and Google made. But I knew what I liked: servers. There are many other flavors of servers I never worked on: file systems, build infrastructure, cluster management, security, monitoring, and more. I did not have expertise in them, but I did have a vague idea of the shape of the problems that each area contained. I felt reasonably confident that were I to enter one of these areas, I could pick up the required knowledge and be an effective contributor. And I knew that there would be a great degree of overlap with what I had done before, because the majority of software engineering is not about building something totally new, but piecing together existing pieces to make a specific hybrid that is tailored to the task at hand. The bits of genuinely new work are the reward you get for your less groundbreaking work.
There were Google engineers who were more talented than I was. They planned out epic projects and came up with brilliant designs that would revolutionize Google’s projects. This elite cadre contained some of the sharpest minds I’ve ever known, and many of them wrote code that was stunning in its elegance and utility. I wasn’t sure that I could ever be one of them. My career to that point hadn’t distinguished me as one of the best of the best. I ranked in the top 10 percent of engineers at Google and I had been content at that. All the time I had spent in graduate classes and writing was time that could have been spent writing more and better code and trying to vault into that top 1 or 2 percent. I realized that I would have to double down on professional development if I were to avoid the encroaching sense of mundanity.
I was also distressed by the disconnect I felt between my work and reality. The god’s-eye view of the world’s data had numbed my relations to the world. Google, though it contained some of the most highly skilled of humanity, was, like all large software firms, committed to serving mediocrity by definition. By 2008, social networks were on the rise, and the average quality of content on the web was decreasing, flattened by large corporations into increasingly utilitarian and identical formats. There was still optimism in the air then. As of 2018, common consensus declares the unwashed internet to be a garbage dump of humanity’s rejects. Even in 2008 there was an increasing sense that we, the engineers, were in a significant way other from the people who used our work. It was no longer 1995, when engineers made up a large component of the internet community. Increasingly we became spectators of our creations.
Mathematician Godfrey Harold Hardy discusses the split between the worldly and the theoretical in A Mathematician’s Apology. For him, a brilliant mathematician, the descent from the Olympus of theoretical mathematics to the mundane real world was ignominious. For Hardy, mathematics was an eternal Platonic realm. Mathematical truths, whether they are genuine truths or not, are the most enduring discoveries of history—cross-cultural and seemingly eternal.*10 But Google engineers did not strive for truth the way that mathematics did. We built, only to rebuild again and again, sorting and reshaping the data into momentarily useful and profitable heaps. Only researchers of the purest computer science aim at mathematics-like truths. Software engineering is closer to applied mathematics, at which Hardy looks down.
But is not the position of an ordinary applied mathematician in some ways a little pathetic? If he wants to be useful, he must work in a humdrum way, and he cannot give full play to his fancy even when he wishes to rise to the heights. “Imaginary” universes are so much more beautiful than this stupidly constructed “real” one; and most of the finest products of an applied mathematician’s fancy must be rejected, as soon as they have been created, for the brutal but sufficient reason that they do not fit the facts.
Applied mathematics and software engineering promise more dominion over this “stupidly constructed” universe than pure mathematics can, but the form that dominion takes is always dubious. For Hardy, writing in 1940, that dominion was warfare, the research efforts that marshaled the most advanced mathematics and physics of our day to create the mushroom clouds of Trinity. Today, technology’s dominion is a combination of commercial and governmental interest. Threading the needle to “do good” is not easy, and Hardy’s decision to retreat into the pure mathematical world as a refuge from the horrible realities of his age remains a stoic temptation. I did not have that choice available to me. But Hardy’s statement of purpose was not wholly lost on me:
Judged by all practical standards, the value of my mathematical life is nil; and outside mathematics it is trivial anyhow. I have just one chance of escaping a verdict of complete triviality, that I may be judged to have created something worth creating.
What I could create at Google I didn’t find worth my creating. To feel ownership over my coded creations, I would need to work far more deeply and intelligently than I had, and that I lacked the incentive to do.
* * *
> —
While at Google I studied James Joyce’s Finnegans Wake in a graduate class taught by Joyce scholar and polymath Edmund Epstein.*11 Finnegans Wake is a vast catalogue of the permutations of human existence, and it is overstuffed with meaning and structure. Interpretation on top of interpretation stack atop one another, contradictions everywhere. Difficult to parse, it was also paradoxically democratic. No one can claim to understand Finnegans Wake definitively. Everyone is capable of bringing something of their own to the book. Literature professor Leslie L. Lewis created a unique visual representation of the book’s interlocking content and themes, superimposing the cyclical and circular over the linear and rectangular, just as the book does. Physicist Murray Gell-Mann decided that the constituent building blocks of atoms should be called “kworks,” and found the phrase “Three quarks for Muster Mark” in Finnegans Wake. He contrived an explanation for altering its pronunciation:
I had been calling them quarks most of the year, but I supposed it was probably spelled k-w-o-r-k or something like that. It seemed the right sound for a new particle that was the fundamental constituent of nuclei and so on, but I didn’t know how to spell it. But then paging through Finnegans Wake, which I had done often since my brother had brought home the first American printing in 1939, I saw “Three quarks for Muster Mark!,” and of course it’s “quark,” it rhymes with a whole bunch of things: Mark, bark, and so on and so forth. But I wanted to pronounce it “kwork” so I invented an excuse for pronouncing it kwork—namely that Humphrey Chimpden Earwicker, whose dream is Finnegans Wake, the book, is a publican. And so a number of things in the book are calls for drinks at the bar. Of course, the words are multiply determined; they’re portmanteau words as in Alice [Through the Looking-Glass]. But one determinant is often calls for drinks at the bar: “Three poss of porter pease,” for example, has something to do with “Three pots of porter, please.” And so here I figured that one of the contributors to “Three quarks for Muster Mark!” might be “Three quarts for Mister Mark!” And it may in fact be true.
Bitwise Page 20