Algorithms to Live By

Page 26

by Brian Christian

Packet Switching

What we now think of as “the Internet” is actually a collection of many protocols, but the chief among them (so much so that it’s often referred to more or less synonymously with the Internet) is what’s known as Transmission Control Protocol, or TCP. It was born from a 1973 talk and a 1974 paper by Vinton “Vint” Cerf and Robert “Bob” Kahn, who laid out a proposal for the language of—as they imagined calling it—an “internetwork.”

TCP initially used telephone lines, but it’s more appropriately regarded as the evolution of the mail rather than the phone. Phone calls use what’s called “circuit switching”: the system opens a channel between the sender and the receiver, which supplies constant bandwidth between the parties in both directions as long as the call lasts. Circuit switching makes plenty of sense for human interaction, but as early as the 1960s it was clear that this paradigm wasn’t going to work for machine communications.

As UCLA’s Leonard Kleinrock recalls,

I knew that computers, when they talk, they don’t talk the way I am now—continuously. They go blast! and they’re quiet for a while. A little while later, they suddenly come up and blast again. And you can’t afford to dedicate a communications connection to something which is almost never talking, but when it wants to talk it wants immediate access. So we had to not use the telephone network, which was designed for continuous talking—the circuit switching network—but something else.

The telephone companies, for their part, did not seem especially amenable to talk of a fundamental shift in their protocols. Moving away from circuit switching was considered lunatic—“utter heresy,” in the words of networking researcher Van Jacobson. Kleinrock reminisces about his own encounters with the telecommunications industry:

I went to AT&T, the biggest network of the time, and I explained to them, you guys ought to give us good data communications. And their answer was, what are you talking about? The United States is a copper mine, it’s full of telephone wires, use that. I said no, no, you don’t understand. It takes 35 seconds to set up a call, you charge me a minimum of 3 minutes, and I want to send 100 milliseconds of data! And their answer was, “Little boy, go away.” So little boy went away and, with others, developed this technology which ate their lunch.

The technology that ate circuit switching’s lunch would become known as packet switching. In a packet-switched network, rather than using a dedicated channel for each connection, senders and receivers atomize their messages into tiny shards known as “packets,” and merge them into the communal flow of data—a bit like postcards moving at the speed of light.

In such a network, “what you might call a connection is a consensual illusion between the two endpoints,” explains Apple networking expert Stuart Cheshire. “There are no connections in the Internet. Talking about a connection in the Internet is like talking about a connection in the US Mail system. You write letters to people and each letter goes independently—and you may have a correspondence that goes back and forth and has some continuity to it, but the US Mail doesn’t need to know about that.… They just deliver the letters.”

Efficient use of bandwidth wasn’t the only consideration driving research into packet switching in the 1960s; the other was nuclear war. Paul Baran at the RAND Corporation was trying to solve the problem of network robustness, so that military communications could survive a nuclear attack that took out a sizable fraction of the network. Inspired by algorithms developed in the 1950s for navigating mazes, Baran imagined a design in which every piece of information could independently make its own way to its destination, even as the network was changing dynamically—or being torn to tatters.

This was the second demerit against circuit switching and its dedicated, stable connections: that very stability meant that a dropped call stayed dropped. Circuit switching just wasn’t flexible or adaptable enough to be robust. And here, too, packet switching could offer just what the times were calling for. In circuit-switched networks, a call fails if any one of its links gets disrupted—which means that reliability goes down exponentially as a network grows larger. In packet switching, on the other hand, the proliferation of paths in a growing network becomes a virtue: there are now that many more ways for data to flow, so the reliability of the network increases exponentially with its size.

Still, as Van Jacobson tells it, even after packet switching was devised, the phone companies were unimpressed. “All the telco people said, with very loud voices, that’s not a network! That’s just a crummy way to use our network! You’re taking our wires, you’re sending on the paths that we create! And you’re putting a lot of extra gunk on it so that you use it really inefficiently.” But from a packet-switching point of view, the phone wires are just a means to an end; the sender and receiver don’t actually care how the packets get delivered. The ability to operate agnostically over any number of diverse media would be packet switching’s great virtue. After early networks in the late ’60s and early ’70s, such as the ARPANET, proved the viability of the concept, networks of all types began sprouting across the country, doing packet switching not only over copper phone wires, but over satellites and over radio. In 2001, a group of computer scientists in the Norwegian city of Bergen briefly even implemented a packet-switching network over “Avian Carriers”—that is, packets written down on paper and tied to pigeons’ feet.

Of course, packet switching would not be without its own problems. For starters, one of the first questions for any protocol, human or machine, is, quite simply: how do you know your messages are getting through?

Acknowledgment

No transmission can be 100 percent reliable.

—VINT CERF AND BOB KAHN

“WHAT HATH GOD WROUGHT” wasn’t just the first long-distance telegraph message sent in the United States. It was also the second: Alfred Vail sent the quotation back to Morse in the Supreme Court chambers as a way of confirming receipt.

Now, Vail’s reply could make Morse, and the US legislators gathered around him, confident that Morse’s message had been received—presuming, of course, that Vail hadn’t known the choice of message in advance. But what would make Vail confident that his confirmation had been received?

Computer scientists know this concept as the “Byzantine generals problem.” Imagine two generals, on opposite sides of a valley that contains their common enemy, attempting to coordinate an attack. Only by perfect synchronization will they succeed; for either to attack alone is suicide. What’s worse, any messages from one general to the other must be delivered by hand across the very terrain that contains the enemy, meaning there’s a chance that any given message will never arrive.

The first general, say, suggests a time for the attack, but won’t dare go for it unless he knows for sure that his comrade is moving, too. The second general receives the orders and sends back a confirmation—but won’t dare attack unless he knows that the first general received that confirmation (since otherwise the first general won’t be going). The first general receives the confirmation—but won’t attack until he’s certain that the second general knows he did. Following this chain of logic requires an infinite series of messages, and obviously that won’t do. Communication is one of those delightful things that work only in practice; in theory it’s impossible.

In most scenarios the consequences of communication lapses are rarely so dire, and the need for certainty rarely so absolute. In TCP, a failure generally leads to retransmission rather than death, so it’s considered enough for a session to begin with what’s called a “triple handshake.” The visitor says hello, the server acknowledges the hello and says hello back, the visitor acknowledges that, and if the server receives this third message, then no further confirmation is needed and they’re off to the races. Even after this initial connection is made, however, there’s still a risk that some later packets may be damaged or lost in transit, or arrive out of order. In the postal mail, package delivery can be confirmed via return receipts; online, packet de
livery is confirmed by what are called acknowledgment packets, or ACKs. These are critical to the functioning of the network.

The way that ACKs work is both simple and clever. Behind the scenes of the triple handshake, each machine provides the other with a kind of serial number—and it’s understood that every packet sent after that will increment those serial numbers by one each time, like checks in a checkbook. For instance, if your computer initiates contact with a web server, it might send that server, say, the number 100. The ACK sent by the server will in turn specify the serial number at which the server’s own packets will begin (call it 5,000), and also will say “Ready for 101.” Your machine’s ACK will carry the number 101 and will convey in turn “Ready for 5,001.” (Note that these two numbering schemes are totally independent, and the number that begins each sequence is typically chosen at random.)

This mechanism offers a ready way to pinpoint when packets have gone astray. If the server is expecting 101 but instead gets 102, it will send an ACK to packet 102 that still says “Ready for 101.” If it next gets packet 103, it will say, again, “Ready for 101.” Three such redundant ACKs in a row would signal to your machine that 101 isn’t just delayed but hopelessly gone, so it will resend that packet. At that point, the server (which has kept packets 102 and 103) will send an ACK saying “Ready for 104” to signal that the sequence has been restored.

All those acknowledgments can actually add up to a considerable amount of traffic. We think of, say, a large file transfer as a one-way operation, but in fact the recipient is sending hundreds of “control messages” back to the sender. A report from the second half of 2014 showed that almost 10% of upstream Internet traffic during peak hours was due to Netflix—which we tend to think of as sending data almost exclusively downstream, to users. But all that video generates an awful lot of ACKs.

In the human sphere, the anxiety that the message is indeed being received similarly pervades conversation. A speaker might subconsciously append “You know?” to the end of every sentence, and a listener, for their part, can’t help but make a steady stream of nods, yeahs, aye-ayes, roger-thats, ten-fours, uh-huhs. We do this even face-to-face, but on a phone call sometimes it’s the only way to know the call is even still in progress. No wonder that the most successful twenty-first-century marketing campaign for a wireless carrier featured a network engineer’s quality-control catchphrase, repeated again and again: “Can you hear me now?”

When something goes wrong in that back-and-forth, we’re often left with a question mark. As software blogger Tyler Treat says,

In a distributed system, we try to guarantee the delivery of a message by waiting for an acknowledgement that it was received, but all sorts of things can go wrong. Did the message get dropped? Did the ack get dropped? Did the receiver crash? Are they just slow? Is the network slow? Am I slow?

The issues faced by the Byzantine generals, as he reminds us, “are not design complexities, they are impossibility results.”

Earlier networking research, Vint Cerf notes, had been founded “on the assumption that you could build a reliable underlying net.” On the other hand, “the Internet was based on the assumption that no network was necessarily reliable, and you had to do end-to-end retransmissions to recover.”

Ironically, one of the few exceptions to this is in transmitting the human voice. Real-time voice communications, such as Skype, typically do not use TCP, which underlies most of the rest of the Internet. As researchers discovered in the early days of networking, using reliable, robust protocols—with all their ACKs and retransmission of lost packets—to transmit the human voice is overkill. The humans provide the robustness themselves. As Cerf explains, “In the case of voice, if you lose a packet, you just say, ‘Say that again, I missed something.’”

For this reason, phone services that automatically reduce background noise to silence are doing their users a major disservice. Background static is a continual reassurance that the call is still connected and that any silence is a deliberate choice by the other party. Without it, one must constantly confront the possibility that the call has dropped, and constantly offer reassurances that it has not. This, too, is the anxiety of all packet-switching protocols, indeed of any medium rooted in asynchronous turn-taking—be it letter writing, texting, or the tentative back-and-forths of online dating. Every message could be the last, and there is often no telling the difference between someone taking their time to respond and someone who has long since ended the conversation.

So how exactly should we handle a person—or a computer—that’s unreliable?

The first question is how long a period of nonresponsiveness we should take to constitute a breakdown. Partly this depends on the nature of the network: we start to worry in a matter of seconds over the phone, days over email, and weeks over postal mail. The longer the round-trip time between sender and receiver, the longer it takes a silence to be significant—and the more information can be potentially “in flight” before the sender realizes there’s a problem. In networking, having the parties properly tune their expectations for the timeliness of acknowledgments is crucial to the system functioning correctly.

The second question, of course, once we do recognize a breakdown, is what exactly we should do about it.

Exponential Backoff: The Algorithm of Forgiveness

The world’s most difficult word to translate has been identified as “ilunga,” from the Tshiluba language spoken in south-eastern DR Congo.… Ilunga means “a person who is ready to forgive any abuse for the first time, to tolerate it a second time, but never a third time.”

—BBC NEWS

If at first you don’t succeed, / Try, try again.

—T. H. PALMER

Today we expect our devices to communicate wirelessly even when wires would be easy—our keyboard and mouse, for instance, talking wirelessly with a computer sitting inches away. But wireless networking began as a matter of necessity, in a place where no wires could do the job: Hawaii. In the late ’60s and early ’70s, Norman Abramson at the University of Hawaii in Honolulu was trying to link together the university’s seven campuses and many research institutes, spread across four islands and hundreds of miles. He hit upon the idea of implementing packet switching via radio rather than the phone system, connecting the islands with a loose chain of transmitters and receivers. This system would come to be known as the ALOHAnet.

The biggest hurdle that the ALOHAnet had to overcome was interference. Sometimes two stations would transmit at the same moment, inadvertently jamming one another’s signals. (This is, of course, a familiar feature in human conversation as well.) If both stations simply retransmitted right away to try to get their message across, they’d run the risk of getting stuck in perpetual interference forever. Clearly the ALOHAnet protocol was going to need to tell competing signals how to give each other space, how to yield and make way for one another.

The first thing that the senders need to do here is what’s called “breaking symmetry.” As any sidewalk pedestrian knows, dodging right as an oncoming walker dodges left, and then having both of you simultaneously dodge back the other way, doesn’t solve anything. It’s the same story when two speakers both pause, make gestures of deference to the other, and then start to speak again at the same time; or when two cars at an intersection, each having stopped to yield to the other, try to accelerate in sync. This is an area where the use of randomness becomes essential—indeed, networking wouldn’t be possible without it.

One straightforward solution is to have each station flip a coin. Heads, it retransmits; tails, it waits a turn, then retransmits. Surely one of them will get through uncontested before long. This works well enough when there are only two senders. But what if there are three simultaneous signals? Or four? It would take a one-in-four chance for the network to get even a single packet through at that point (after which you’d still have three conflicting stations left, and perhaps even more competing signals arriving meanwhile). As the number of confl
icts increases further, the network’s throughput could simply fall off a cliff. A 1970 report on the ALOHAnet said that above a mere 18.6% average utilization of the airwaves, “the channel becomes unstable … and the average number of retransmissions becomes unbounded.” Not good.

So, what to do? Is there a way to make a system that could avoid this fate?

The breakthrough turned out to be increasing the average delay after every successive failure—specifically, doubling the potential delay before trying to transmit again. So after an initial failure, a sender would randomly retransmit either one or two turns later; after a second failure, it would try again anywhere from one to four turns later; a third failure in a row would mean waiting somewhere between one and eight turns, and so on. This elegant approach allows the network to accommodate potentially any number of competing signals. Since the maximum delay length (2, 4, 8, 16…) forms an exponential progression, it’s become known as Exponential Backoff.

Exponential Backoff was a huge part of the successful functioning of the ALOHAnet beginning in 1971, and in the 1980s it was baked into TCP, becoming a critical part of the Internet. All these decades later, it still is. As one influential paper puts it, “For a transport endpoint embedded in a network of unknown topology and with an unknown, unknowable and constantly changing population of competing conversations, only one scheme has any hope of working—exponential backoff.”

But it is the algorithm’s other uses that suggest something both more prescriptive and more profound. Beyond just collision avoidance, Exponential Backoff has become the default way of handling almost all cases of networking failure or unreliability. For instance, when your computer is trying to reach a website that appears to be down, it uses Exponential Backoff—trying again one second later, again a few seconds after that, and so forth. This is good for everyone: it prevents a host server that’s down from getting slammed with requests as soon as it comes back online, and it prevents your own machine from wasting too much effort trying to get blood from a stone. But interestingly, it also does not force (or allow) your machine to ever completely give up.

‹ Prev Next ›