by Seth Lloyd
To put it in another way, ten bits correspond to about three digits, meaning a place in the “ones,” the “tens,” and the “hundreds” column as we traditionally count. Measuring amounts of information is simply a matter of counting. Counting in terms of bits is simpler, though less familiar to most, than counting in terms of digits. Counting digits from 0 to 9 is straightforward: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. At this point, though, you have run out of digits, so the next number is written 1 followed by 0, or 10. The number 10 has a 1 in the tens column and a 0 in the ones column. The next number, 11, has a 1 in the tens column and a 1 in the ones column. You can keep on counting in this vein up to 99. The next number is 100, with a 1 in the hundreds column, a 0 in the tens column, and a 0 in the ones column. (You can see why it wasn’t so easy to grasp this way of counting the first time around, when you were five or so.)
Counting with bits works in a similar fashion. Start counting: 0 = zero, 1 = one. So far, so good, but now we have run out of bits. The next combination of bits is 10, which equals two: that is, a 1 in the “twos” column and a 0 in the “ones” column. (The representation of two as “10” is the feature of binary arithmetic that causes the first-time user the most trouble, as in “There are 10 kinds of people: those who know binary, and those who don’t.”) The next combination is 11, which equals three: a 1 in the twos column and a 1 in the ones column. Now we’ve run out of two-bit numbers.
The next combination is 100, which equals four: a 1 in the fours column, a 0 in the twos column, and a 0 in the ones column. Then comes 101, which equals five (1 in the fours column plus 1 in the ones column), 110 = six, 111 = seven. Eight is represented by four bits: 1000, with a 1 in the eights column and 0s in the fours, twos, and ones columns. Because they have only two bits instead of ten digits, the binary numbers get longer more quickly than ordinary, digital numbers.
Just as the powers of ten (tens, hundreds, thousands, millions) are important numbers in the ordinary, decimal way of counting, the powers of two are important numbers for counting with bits: 1 = one = 20, 10 = two = 21, 100 = four = 22, 1000 = eight = 23, 10000 = sixteen = 24, 100000 = thirty-two = 25, 1000000 = sixty-four = 26, 10000000 = one hundred and twenty-eight = 27. These numbers should look familiar to cooks. The English system of weights and measures is a binary system: eight ounces in a cup, sixteen in a pint (the American pint, that is—as in “a pint’s a pound the world around”; the imperial pint is twenty ounces, and the troy pint is twelve ounces), thirty-two in a quart, sixty-four in a half gallon, and one hundred and twenty-eight in a gallon. Expressing numbers in binary notation is no more difficult than expressing measures in quarts, pints, and cups. One hundred and forty-six ounces, for example, is one gallon plus one pint plus one quarter cup: 128 + 16 + 2 = 146. Written in binary, 146 is equal to 10010010 with a one in the “gallons” column, a one in the “pints” column, a one in the “quarter cups” column, with zeros everywhere else. To translate a number into binary, just measure it out in teaspoons.
Figure 2. Counting in Binary
The American system of measuring volume is a binary system.
One hundred and forty-six ounces is one gallon plus one pint plus one quarter cup. In binary notation, 146 = 10010010.
Just as counting in binary is simple (though perhaps not easy for those of us who are new to it), so is binary arithmetic. The entire binary addition table consists of 0 + 0 = 0, 0 + 1 = 1, 1 + 1 = 10. The binary multiplication table is even simpler: 0 × 0 = 0, 0 × 1 = 0, 1 × 1 = 1. Binary is beautiful.
Binary is also practical. The compact nature of binary notation makes it easy to construct simple electronic circuits to do binary arithmetic. These circuits, in turn, are the basis for digital computers. We may not be able to define information, but we can certainly use it.
Precision
“What if there are an infinite number of alternatives?” a student asked. “For example, there are an infinite number of real numbers between 0 and 1.”
“If you have an infinite number of alternatives, then you have an infinite amount of information,” I replied. Pick a binary number: 1001001 0110110 0100000 1110100 1101000 1100101 0100000 1100010 1100101 1100111 1101001 1101110 1101110 1101001 1101110 1101111, for example. Under the common coding scheme known as ASCII (American Standard Code for Information Interchange), each letter or typewriter character is given a seven-bit codeword.
Interpreted in ASCII, this number corresponds to the characters I = 1001001, n = 1101110, (SPACE) = 0100000, t = 1110100, h = 1101000, e = 1100101, (SPACE) = 0100000, b = 1100010, e = 1100101, g = 1100111, i = 1101001, n = 1101110, n = 1101110, i = 1101001, n = 1101110, g = 1100111—that is, the beginning of the passage “In the beginning was the word . . .” By adding more bits, you can produce a number corresponding to the entire text of the Gospel according to John. By adding yet more, you can get the rest of the Bible, followed by the Koran, followed by the Lotus Sutra, followed by all the books in the Library of Congress, and so on. An infinite number of alternatives corresponds to an infinite number of digits or bits—in other words, to an infinite amount of information.
In practice, however, the number of alternatives in any finite system is finite, so the amount of information is finite, too. Normally, we think of quantities such as length, height, and weight as varying continuously: just as there are an infinite number of real numbers between 0 and 1, there are apparently an infinite number of possible lengths between zero meters and one meter. The reason that apparently continuous quantities such as the length of a metal rod can register only a finite amount of information is that these quantities are typically defined only to a finite level of precision. To see the trade-off between precision and information, think of measuring the length of that rod using a meterstick. The meterstick is made of wood. One hundred centimeters are marked and numbered on the stick. One thousand millimeters are marked, ten for each centimeter, but there is not enough room on the meterstick to number them legibly. You can use the meterstick to measure the length of the rod to the accuracy of about a millimeter. Below a millimeter, a meterstick does not measure distances well, simply because its physical characteristics give it a finite resolution. The total number of alternatives is 1,000, corresponding to three digits of accuracy, or about ten bits of information.
A particularly famous metal rod is the one made of an alloy of platinum and iridium that sits in the International Bureau of Weights and Measures in Paris and that for almost a century defined the length of a meter. A meter was originally conceived as being one ten-millionth of the distance from the North Pole to the equator as measured along the Paris meridian. Our meterstick would measure this rod to be one meter long, plus or minus half a millimeter.
With a more accurate measuring device than a meterstick, more bits of information about the rod’s length become available. Look at the rod under an optical microscope. Features can now be discerned down to a size on the order of the wavelength of visible light—slightly less than a micron, which is a millionth of a meter. A microscope applied to a meterstick could be used to measure the length of the rod to an accuracy of a micron. The microscope measures the rod to six digits of accuracy, corresponding to about twenty bits of information. A similar degree of accuracy can be obtained using an interferometer, a device that measures length in terms of the number of wavelengths of light. An interferometer that used light of wavelength one micron would measure the rod as one million wavelengths long.
More extreme means allow even greater degrees of accuracy. In principle one could take a device called an atomic force microscope, which images individual atoms on surfaces, and drag it along the rod, measuring the rod by the number of atoms along its length. The distance between atoms is on the order of one ten-billionth of a meter (10-10 meters), a distance known as an angstrom. Now we have ten digits of accuracy, or about thirty-three bits of information about the length of the rod.
Greater precision in measuring the length of a macroscopic object such as a rod is hard to get. In c
ertain cases, it is possible to measure distances to a much higher degree of precision—for example, in physicist Norman Ramsey’s experiments to measure charge separations as small as one billion billion billionth of a meter within a neutron. The total number of values a measurement device can distinguish is given by the range of values (e.g., a meter) the device can register divided by the highest precision to which the device can measure (e.g., a millimeter). Range divided by precision tells you how many distinguishable values the device can register. The amount of available information is given by the number of bits required to count the available values. A device that gets thirty-three bits of information (ten digits) about a continuous variable is doing very well indeed.
To get thirty-three bits of information about the length of our rod, we have to count that length in atoms: heroic amounts of effort are typically required to wring more than a few tens of bits of information out of a single continuous quantity such as the length of a rod. By contrast, if we use many individual quantities to register information, we can rapidly accumulate many bits. In a quantum computer, each atom registers a bit; to get thirty-three bits requires thirty-three atoms. Our rod contains something like a billion billion billion atoms. If each one registers a bit, then the atoms in the rod can register a billion billion billion bits, far more than the length of the rod on its own can register. In general, the best way to get more information is not to increase the precision of measurements on a continuous quantity, but rather to put together measurements on more and more quantities, each one of which may register only a few bits. This compiling of bits—or digital representation—is effective because the number of total alternatives described grows much faster than does the number of bits.
Recall the king in the folktale who foolishly agreed to reward the hero in grains of wheat: one grain for the first square of the chessboard, two grains for the second, four for the third, and so on up to two raised to the sixty-fourth power (264) grains for the last square. The total number of grains is thus 10 billion billion. If each was only a millimeter long, they would fill almost forty cubic kilometers. As this example illustrates, only a few bits are required to label one of a very large number of alternatives. To give each of the grains on the chessboard a unique barcode, for instance, would require only sixty-five bits, or sixty-five pieces of information. With only 300 bits, you could assign a unique barcode to each of the 1090 elementary particles in the universe. The astronomically huge number of possible genetic codes is the source of the incredible diversity of living things, but the information that produces those codes can be stored on a tiny chromosome.
Meaning
“But doesn’t information have to mean something?” asked a student, troubled.
“You’re right that when we think of information we normally associate it with meaning,” I answered. “But the meaning of ‘meaning’ is not clear.”
For thousands of years, philosophers have tried to determine what “meaning” means, with mixed success. The reason it is so hard to pin down is that the meaning of a piece of information depends very much on how the information is to be interpreted. If you don’t know how a message is to be interpreted, then you don’t know its meaning. For example, if I say to you, “Yes,” when you haven’t asked a question, then you don’t know what I mean by “Yes.” If you ask, “Can I have another piece of cake?” and I say, “Yes,” then you know what I mean. If you ask, “What is two plus two?” and I say, “Yes,” then you don’t know what I mean (though you might start to suspect that I have only one answer to any question). If you ask, “What is two plus two?” and I say “Four,” then you know what I mean. Meaning is a bit like pornography: you know it when you see it.
Consider the string of bits given earlier: 1001001 1101110 0100000 1110100 1101000 1100101 0100000 1100010 1100101 1100111 1101001 1101110 1101110 1101001 1101110 1100111. Interpreted as a message encoded in ASCII, this string means “In the beginning.” But taken on its own, with no specification of how it is to be interpreted, it means nothing other than itself. Meaning is defined only relative to a scheme of interpretation, as the following conversation between Alice and Humpty Dumpty reveals:
“I don’t know what you mean by ‘glory,’” Alice said.
Humpty Dumpty smiled contemptuously. “Of course you don’t—till I tell you. I meant ‘there’s a nice knock-down argument for you!’”
“But ‘glory’ doesn’t mean ‘a nice knock-down argument,’” Alice objected.
“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
Lewis Carroll, the author of Alice’s Adventures in Wonderland and Through the Looking-Glass, was in real life Charles Dodgson, the nominalist philosopher. Dodgson was quite happy with the notion of words meaning what he chose them to mean.
A common way to express the dependence of meaning on interpretation is to use Ludwig Wittgenstein’s notion of a language game. This is a game in which words are assigned meanings in terms of the actions they elicit from the players. Wittgenstein’s Philosophical Investigations, for example, begins with a simple language game: A builder can ask her assistant for either a block, a pillar, a slab, or a beam. If she says “Block,” her assistant hands her a block. If she says “Slab,” her assistant hands her a slab. In this simplest of language games, we infer that the assistant knows what the builder means when she says “Block”: she means “Hand me the block.”
As language games become more complicated, the meaning of meaning becomes less easy to tease out of the dynamics of the game. Part of the problem is that natural human language is frequently ambiguous: a given statement can often have many possible meanings. Part of the problem is that we do not fully understand how the brain responds to language, so that even if we know that “Block” means “Hand me the block,” we don’t know the physical mechanism by which the brain of the listener arrives at this meaning. It would be useful to have an example of a situation in which a piece of information can be interpreted in only one way and in which the mechanism eliciting a response is completely known.
Computers supply one such mechanism. Computers respond to languages called computer languages (Java, C, Fortran, BASIC). Such languages consist of simple commands—such as PRINT or ADD—that can be strung together to instruct the computer to perform complicated tasks. If you adopt Wittgenstein’s perspective that the meaning of a piece of information is to be found in the action this information provokes, the meaning of a computer program written in a particular computer language is to be found in the actions the computer performs as it interprets that program. All the computer is doing is performing sequences of elementary logic operations, such as AND, NOT, and COPY (to be discussed further later). The computer program unambiguously instructs the computer to perform a particular sequence of those operations. The “meaning” of a computer program is thus universal, in the sense that two computers following the same instructions will perform the same set of information-processing operations and obtain the same result.
The unambiguous nature of a computer program means that one and only one meaning is assigned to each statement. If a statement in a computer language has more than one possible interpretation, an error message is the result: for computers, ambiguity is a bug. By comparison, human languages are rich in ambiguity: except in special circumstances, most statements in, for example, English, have a variety of potential meanings, and this is a key aspect of poetry, fiction, flirting, and plain everyday conversation. The ambiguity of human language is not a bug, it’s a bonus!
Although meaning is hard to define, it is one of the most powerful features of information. The basic idea of information is that one physical system—a digit, a letter, a word, a sentence—can be put into corresp
ondence with another physical system. The information stands for the thing. Two fingers can be used to represent two cows, two people, two mountains, two ideas. A word can stand for anything (anything for which we have a word): orange, cow, money, freedom. By putting words in sentences, one can express—well, anything that can be expressed in words. The words in sequence can stand for a complicated thought.
In the same way that words can represent ideas and things, so can bits. The word and the bit are means by which information is conveyed, though the interpreter must supply the meaning.
The Computer
“What is a computer?” I asked my class. No answer. This was strange seeing as I was pretty sure my students had been using them since their first birthdays.
I waited. Finally someone volunteered an answer: “A machine that manipulates data stored as 0’s and 1’s.” Another student disagreed, “You’re talking about a digital computer. What about an analog computer? They store information on continuous voltage signals.”
Eventually, everyone agreed on a significantly broader definition: a computer is a machine that processes information.
OK, I said, then what was the first computer? Now the class had warmed up: “The Mark 1.” “Babbage’s mechanical computer.” “The slide rule.” “The abacus.” “The brain.” “DNA.”
A raised hand wiggled its fingers: “Digits!”
Clearly, if you define a computer as a machine that processes information, then pretty much anything can compute.
“For the moment,” I said, “let’s just look at machines that people fabricate to process information and save the question of people as information-processing machines for later.”