Geek Sublime Page 12 Read online free by Vikram Chandra

Home > Fiction > Geek Sublime > Page 12

Geek Sublime Page 12

Programmers work doggedly toward correctness, but the sheer size and complexity of software ensures that bugs lurk within. A bug is, of course, a flaw or fault in a program that produces unexpected results. In 1986, the award-winning researcher and academic Jon Bentley published a book that is now widely regarded as a classic, Programming Pearls. One of the algorithms he implemented was for binary search, a method of finding a value in a sorted array, which was first published in 1946. In 2006, decades after the publication of Bentley’s book—by which time his particular implementation had been copied and used many thousands of times—one of his erstwhile students, Joshua Bloch, discovered that under certain conditions this technique could manifest a bug.16 Bloch published his finding under a justly panic-raising headline, “Extra, Extra—Read All About It: Nearly All Binary Searches and Mergesorts are Broken.” He wrote:

The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.

Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it.17

That software algorithms are now running our whole world means that software faults or errors can send us down the wrong highway, injure or kill people, and cause disasters. Every programmer is familiar with the most infamous bugs: the French Ariane 5 rocket that went off course and self-destructed forty seconds after lift-off because of an error in converting between representations of number values; the Therac-25 radiation therapy machine that reacted to a combination of operator input and a “counter overflow” by delivering doses of radiation a hundred times more intense than required, resulting in the agonizing deaths of five people and injuries to many others; the “Flash Crash” of 2010, when the Dow Jones suddenly plunged a thousand points and recovered just as suddenly, apparently as a result of automatic trading programs reacting to a single large trade.

These are the notorious bugs, but there are bugs in every piece of software that you use today. A professional “cyber warrior,” whose job it is to find and exploit bugs for the US government, recently estimated that “most of the software written in the world has a bug every three to five lines of code.”18 These bugs may not kill you, but they cause your system to freeze, they corrupt your data, and they expose your computers to hackers. The next great hope for more stable, bug-free software is functional programming, which is actually the oldest paradigm in computing—it uses the algebraic techniques of function evaluation used by the creators of the first computers. In functional programming, all computation is expressed as the evaluation of expressions; the radical simplicity of thinking about programming as only giving input to functions that produce outputs promotes legibility and predictability. There is again the same fervent proselytizing about functional programming that I remember from the early days of OOP, the same conviction that this time we’ve discovered the magic key to the kingdom. Functional languages like Clojure conjure up the clean symmetries of mathematics, and hold forth the promise of escape from all the jugaadu workarounds that turn so much code into a gunky, biological-seeming mess. In general, though, programmers are now skeptical of the notion that there’s any silver bullet for complexity. The programmer and popular blogger Steve Yegge, in his foreword to a book called The Joy of Clojure, describes the language as a “minor miracle” and “an astoundingly high-quality language … the best I’ve ever seen,” but he also notes that it is “fashionable,” and that

our industry, the global programming community, is fashion-driven to a degree that would embarrass haute couture designers from New York to Paris … Fashion dictates the programming languages people study in school, the languages employers hire for, the languages that get to be in books on shelves. A naive outsider might wonder if the quality of a language matters a little, just a teeny bit at least, but in the real world fashion trumps all.19

In respect to programming languages and techniques, the programming industry has now been through many cycles of faith and disillusionment, and many of its members have acquired a sharp, necessary cynicism. “Hype Cycle”—a phrase coined by the analysts at Gartner, Inc.—adroitly captures the up-and-down fortunes of many a tech fad. 20

The tools and processes used to manage all this complexity engender another layer of complexity. All but the simplest programs must be written by teams of programmers, each working on a small portion of the system. Of course these people must be managed, housed, provided with equipment, but also their product—the code itself—must be distributed, shared, saved from overwriting or deletion, integrated, and tested.

Entire industries have grown around these necessities. Software tools for building software—particularly “Integrated Development Environments,” applications used to write applications—are some of the most complex programs being built today. They make the programmer’s job easier, but the programmer must learn how to use them, must educate herself in their idiosyncrasies and the workarounds for their faults. This is not a trivial task. For example, every programmer needs to use a revision control system to track changes and easily branch and merge versions of code. The best-regarded revision control system today is Git, created by Linus Torvalds (and named, incidentally, after his famous cantankerousness).21 Git’s interface is command-line driven and famously UNIX-y and complex, and for the newbie its inner workings are mysterious. In response to a blog post titled “Git Is Simpler Than You Think,” an irritated Reddit commenter remarked, “Yes, also a nuclear submarine is simpler than you think … once you learn how it works.”22 I myself made three separate attempts to learn how Git worked, gave up, was frustrated enough by other revision control systems to return, and finally had to read a 265-page book to acquire enough competence to use the thing. Git is immensely powerful and nimble, and I enjoy using it, but maneuvering it felt—at least initially—like a life achievement of sorts.

You may have to use a dozen tools and websites to handle the various logistical aspects of software development, and soon the triumph starts to wear a little thin. Add another dozen software libraries and frameworks that you may use internally in your programs—again, each one comes bristling with its own eccentricities, bugs, and books—and weariness sets in. Each tool and preconstructed library solves a problem that you must otherwise solve yourself, but each solution is a separate body of knowledge you must maintain. A user named jdietrich wrote in a discussion on Hacker News:

My biggest gripe with modern programming is the sheer volume of arbitrary stuff I need to know. My current project has so far required me to know about Python, Django, Google App Engine and its datastore, XHTML, CSS, JQuery, Javascript, JSON, and a clutch of XML schema, APIs and the like …

Back in ye olden days, most programming tasks I performed felt quite natural and painless, just a quiet little chat between me and the compiler. Sometimes longwinded, sometimes repetitive, but I just sat and thought and typed and software happened. The work I do these days feels more like being a dogsbody at the tower of babel. I just don’t seem to feel fluent in anything much any more.23

And every year, the new technologies arrive in a cloud of acronyms and cute names: MongoDB, HTML5, PaaS, CoffeeScript, TPL, Rx. One must keep up. On programmers.stackexchange.com, one hapless coder wrote:

I was humbled at a job interview yesterday almost to the point of a beat-down and realized that although I know what I know, my skills are pretty old and I’m getting to where I don’t know what I don’t know, which for a tech guy is a bad thing.

I don’t know if I can keep current just doing my day to day job, so I need to make sure I at least know what’s out there.

Are there well known blogs I should be keeping up with for software development?24

The best—or at least the most ambitious—programmers
read blogs and books, attend conferences to keep up with the state of the art, learn a new language every year or two. When you begin programming, one of the attractions is the certainty that you will never run out of things to learn. But after a few years of working in a corporate cubicle under exploitive managers, after one deadline too many, after family and age and a tiring body, learning the ins and outs of the latest library can feel like another desperate sprint on a nonstop treadmill. There is a reigning cult of overwork in the industry—the legend of the rock-star programmer usually has him coding sixteen hours a day, while simultaneously contributing to open-source projects, blogging, conferencing, and somehow managing to run a start-up—and this ideal has led many an aspirant to burnout, complete with techie thousand-yard-stare, clinical depression, outbursts of anger, and total disinterest in programming. This trough of disillusionment is so deep that for many, the only way to emerge from it is to leave the industry altogether, which rewards a few with fame and dazzling amounts of money, but treats the many as disposable cogs in its software production machine. The endless cycle of idea and action, endless invention, endless experiment, all this knowledge of motion takes its toll, leaves behind a trail of casualties.

Butler Lampson’s hope that millions of ordinary people would write “non-trivial programs” and thus become poets of logic has proved elusive. From the sixties onwards, numerous technologists have promised that their new programming languages would make programmers redundant, that “managers [could] now do their own programming; engineers [could] now do their own programming.”25 Advertisements touted the magical abilities of “automatic programming systems.” But Knuth’s “Software is hard” dictum still remains true, and business users have found that getting custom software out of IT departments requires large budgets and lots of patience. This is because programmers—at their best—try to build software out of elegant code that is modular, secure, and legible, which takes time and money. Instead of waiting, mere mortals often hack something together in the programs they already have available on their machines. This is why, according to the economics blogger James Kwak, “Microsoft Excel is one of the greatest, most powerful, most important software applications of all time.”26 Much of the planet’s business data is stored in Excel, and its intuitive interface allows non-programmers access to some very powerful capabilities. Executives and marketers and secretaries write formulae and macros to extract information when they need it, and are therefore able to take action in a timely fashion. The trouble is that in Excel there

is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets … The biggest problem is that anyone can create Excel spreadsheets—badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.27

Sloppy Excel-wrangling can lead to some very bad decisions, as in the “London Whale” trading disaster of 2012, which caused the financial services firm JPMorgan Chase a loss of approximately six billion dollars; the company’s internal investigation listed as one of the contributing factors a financial modeling process which required cutting and pasting data through a series of spreadsheets. One of these spreadsheets contained a formula dividing by the sum of some numbers instead of their average.28

The day that millions will dash off beautiful programs—as easily as with a pencil—still remains distant. The “lovely gems and brilliant coups” of coding remain hidden and largely incomprehensible to outsiders. But the beauty that programmers pursue leads to their own happiness, and—not incidentally—to the robustness of the systems they create, so the aesthetics of code impact your life more than you know.

For example, one of the problems that have always plagued programmers is the “maintenance of state.” Suppose you have a hospital that sends out invoices for services provided, accepts payments, and also sends out reminders for overdue payments. On Tuesday evening, Ted creates an invoice for a patient, but then leaves the office for an early dinner; there is now an “Invoice” object in the system. This object has its “InvoiceNumber” field set to 56847, and its “Status” field set to “Created.” All of these current settings together constitute this invoice’s “state.” The next morning, Ted comes in and adds a couple of line items to this invoice. Those inserted line items and a new “Status” setting of “Edited” along with all the other data fields are now the invoice’s state. After a coffee break, Ted deletes the second line item and adds two more. He has changed the invoice’s state again. Notice that we’ve already lost some information—from now on, we can’t ever work out that Ted once inserted and deleted a line item. If you wanted to track historical changes to the invoice, you would have to build a whole complex system to store various versions.

Things get even more complicated in our brave new world of networked systems. Ted and his colleagues can’t keep up with the work, so an offshored staff is hired to help, and the invoice records are now stored on a central server in Idaho. On Thursday afternoon, Ted begins to add more line items to invoice 56847, but then is called away by a supervisor. Now Ramesh in Hyderabad signs on and begins to work on the same invoice. How should the program deal with this? Should it allow Ramesh to make changes to invoice 56847? But maybe he’ll put in duplicate line items that Ted has already begun working on. He may overwrite information—change the “Status” field to “Sent”—and thereby introduce inconsistencies into the system. You could lock the entire invoice record for 56847 on a first come, first served basis, and tell Ramesh he can’t access this invoice because someone else is editing it. But what if Ted decides to go to lunch, leaving 56847 open on his terminal? Do you maintain the lock for two hours?

Guarding against inconsistencies, deadlocks of resources by multiple users, and information loss has traditionally required reams of extremely complex code. If you’ve ever had a program or a website lose or mangle your data, there’s a good likelihood that object state was mismanaged somewhere in the code. A blogger named Jonathan Oliver describes working on a large system:

It was crazy—crazy big, crazy hard to debug, and crazy hard to figure out what was happening through the rat’s nest of dependencies. And this wasn’t even legacy code—we were in the middle of the project. Crazy. We were fighting an uphill battle and in a very real danger of losing despite us being a bunch of really smart guys.29

The solution that Oliver finally came to was event sourcing. With this technique, you never store the state of an object, only events that have happened to the object. So when Ted first creates invoice 56847 and leaves the office, what the program sends to CentralServer in Idaho are the events “InvoiceCreated” (which contains the new invoice number) and “InvoiceStatusChanged” (which contains the new status). When Ted comes back the next morning and wants to continue working on the invoice, the system will retrieve the events related to this invoice from CentralServer and do something like:

Invoice newInvoice = new Invoice();

foreach(singleEvent in listOfEventsFromCentralServer)

{

newInvoice.Replay(singleEvent);

}

That is, you reconstitute the state of an object by creating a new object and then “replaying” events over it. Ted now has the most current version of invoice 56847, conjured up through a kind of temporally shifted rerun of events that have already happened. In this new system, history is never lost; when Ted adds a line item, a “LineItemAdded” event will be generated, and when he deletes one, a “LineItemDeleted” event will be stored. If, at some point in the future, you wanted to know what the invoice looked like on Wednesday morning, you would just fire off your “Replay” routine and tell it to cease replaying events once it got past 9 a.m. on Wednesday morning. You can stop locking resources: because events can be generated at a very fine granular level, it becomes much easier to write code that will cause CentralServer to r
eject events that would introduce inconsistencies, to resolve conflicts, and—if necessary—pop up messages on Ted’s and Ramesh’s screens. Events are typically small objects, inexpensive to transfer over the wire and store, and server space grows cheaper every day, so you don’t incur any substantial added costs by creating all these events.

Oliver writes that when he discovered event sourcing

it was as if a light went on. I could literally see how adoption of event sourcing could shed a massive amount of incidental and technical complexity from my project … Fast forward to today. [I] now have a number of systems in production with several more that are only weeks away and I literally could not be happier. I have significantly more confidence in my software than I had in the past. The code is dramatically cleaner and infinitely more explicit than it would have been otherwise. But that’s only the starting point. Our ability to expand, adapt, and scale—to be agile from a business perspective—is infinitely greater than it ever has been, even with each application being significantly larger and each associated domain exponentially more complex than before—all with a smaller team.30

“Dramatically cleaner and infinitely more explicit” code is beautiful, and here, enhanced function follows from form. But letting go of object state and embracing events requires some effort and imagination. During another blog discussion about event-sourcing code, a user more familiar with the old-style methods of storing current state remarked, “Yes, this code is beautiful, really beautiful. And my … brain almost blew up when I tried to understand the process.”31

‹ Prev Next ›