In the Beginning Was Information

Home > Other > In the Beginning Was Information > Page 15
In the Beginning Was Information Page 15

by Werner Gitt


  4. Spiritual matter: Except for mass deficits occurring in nuclear processes, there is also a conservation law for matter. If, by way of analogy, we search for something permanent in our spiritual life, it will be found in the fruits of our labors for God according to the Bible. Heinrich Kemner always emphasized the difference between success and fruit. Natural man seeks success in life, but a spiritual person finds it in fruit. Success depends mainly on our efforts, but fruit stems from grace and it only grows when our life is linked with Jesus. He unlocked this secret in the parable of the vine: "No branch can bear fruit by itself; it must remain in the vine. Neither can you bear fruit unless you remain in me. I am the vine; you are the branches. If a man remains in me and I in him, he will bear much fruit; apart from me you can do nothing" (John 15:4–5). All our works will be revealed when God judges the world. Whatever we may regard as great successes in our life will be consumed in God’s testing fire; only fruit in Jesus will be conserved and earn rewards (1 Cor. 3:11–14). It is God’s declared will that we should build our life on the fruit (John 15:2; Rom. 1:13; Gal. 5:22; Phil. 4:17; Col. 1:10), for Jesus said, "I chose you …to go and bear fruit — fruit that will last" (John 15:16).

  Only one life, it will soon be past;

  Only what’s done for Christ, will last!

  Appendix

  Appendix A1

  The Statistical View of Information

  A1.1 Shannon’s Theory of Information

  Claude E. Shannon (born 1916), in his well-known book A Mathematical Theory of Communications [S7, 1948], was the first person to formulate a mathematical definition of information. His measure of information, the "bit" (binary digit), had the advantage that quantitative properties of strings of symbols could be formulated. The disadvantage is just as plain: Shannon’s definition of information entails only one minor aspect of the nature of information, as we will discuss at length. The only value of this special aspect is for purposes of transmission and storage. The questions of meaning, comprehensibility, correctness, and worth or worthlessness are not considered at all. The important questions about the origin (sender) and for whom it is intended (recipient) are also ignored. For Shannon’s concept of information, it is completely immaterial whether a sequence of symbols represents an extremely important and meaningful text, or whether it was produced by a random process. It may sound paradoxical, but in this theory, a random sequence of symbols represents the maximum value of information content — the corresponding value or number for a meaningful text of the same length is smaller.

  Shannon’s concept: His definition of information is based on a communications problem, namely to determine the optimal transmission speed. For technical purposes, the meaning and import of a message are of no concern, so that these aspects were not considered. Shannon restricted himself to information that expressed something new, so that, briefly, information content = measure of newness, where "newness" does not refer to a new idea, a new thought, or fresh news — which would have encompassed an aspect of meaning. It only concerns the surprise effect produced by a rarely occurring symbol. Shannon regards a message as information only if it cannot be completely ascertained beforehand, so that information is a measure of the unlikeliness of an event. An extremely unlikely message is thus accorded a high information content. The news that a certain person out of two million participants has drawn the winning ticket, is for him more "meaningful" than if every tenth person stood a chance, because the first event is much more improbable.

  Figure 31: Model of a discrete source for generating sequences of symbols. The source has a supply of N different symbols (e.g., an alphabet with 26 letters), of which a long sequence of n symbols is transmitted one after another at a certain time. The source could be a symbol generator which releases random sequences of symbols according to a given probability distribution, or it could be an unknown text stored on magnetic tape which is transmitted sequentially (i. e. one symbol at a time).

  Before a discrete source of symbols (NB: not an information source!) delivers one symbol (Figure 31), there is a certain doubt as to which one symbol ai of the available set of symbols (e.g., an alphabet with N letters a1, a2, a3, …, aN) it will be. After it has been delivered, the previous uncertainty is resolved. Shannon’s method can thus be formulated as the degree of uncertainty which will be resolved when the next symbol arrives. When the next symbol is a "surprise," it is accorded a greater information value than when it is expected with a definite "certainty." The reader who is mathematically inclined may be interested in the derivation of some of Shannon’s basic formulas; this may contribute to a better understanding of his line of reasoning.

  1. The information content of a sequence of symbols: Shannon was only interested in the probability of the appearance of the various symbols, as should now become clearer. He thus only concerned himself with the statistical dimension of information, and reduces the information concept to something without any meaning. If one assumes that the probability of the appearance of the various symbols is independent of one another (e.g., "q" is not necessarily followed by "u") and that all N symbols have an equal probability of appearing, then we have: The probability of any chosen symbol xi arriving is given by pi = 1/N. Information content is then defined by Shannon in such a way that three conditions have to be met:

  i) If there are k independent messages[21] (symbols or sequences of symbols), then the total information content is given by Itot = I1+ I2+…+ Ik. This summation condition regards information as quantifiable.

  ii) The information content ascribed to a message increases when the element of surprise is greater. The surprise effect of the seldom-used "z" (low probability) is greater than for "e" which appears more frequently (high probability). It follows that the information value of a symbol xi increases when its probability pi decreases. This is expressed mathematically as an inverse proportion: I ~ 1/pi.

  iii) In the simplest symmetrical case where there are only two different symbols (e.g., "0" and "1") which occur equally frequently (p1 = 0.5 and p2 = 0.5), the information content I of such a symbol will be exactly one bit.

  According to the laws of probability, the probability of two independent events (e.g., throwing two dice) is equal to the product of the single probabilities:

  (1) p = p1 x p2

  The first requirement (i) I(p) = I(p1 x p2) = I(p1) + I(p2) is met mathematically when the logarithm of equation (1) is taken. The second requirement (ii) is satisfied when p1 and p2 are replaced by their reciprocals 1/p1 and 1/p2:

  (2) I (p1 x p2) = log(1/p1) + log(1/p2).

  As yet, the base b of the logarithms in equation (2) entails the question of measure and is established by the third requirement (iii):

  (3) I = logb (1/p) = logb (1/0.5) = logb 2 = 1 bit

  It follows from logb 2 = 1 that the base b = 2 (so we may regard it as a binary logarithm, as notation we use log2 = lb; giving lb x = (log x)/(log 2); log x means the common logarithm that employs the base 10: log x = log10 x). We can now deduce that the definition of the information content I of one single symbol with probability p of appearing, is

  (4) I(p) = lb(1/p) = - lb p ≥ 0.

  According to Shannon’s definition, the information content of a single message (whether it is one symbol, one syllable, or one word) is a measure of the uncertainty of its reception. Probabilities can only have values ranging from 0 to 1 (0 ≤ p ≤1), and it thus follows from equation (4) that I(p) ≥ 0, meaning that the numerical value of information content is always positive. The information content of a number of messages (e.g., symbols) is then given by requirement (i) in terms of the sum of the values for single messages

  (5)

  As shown in [G7], equation (5) can be reduced to the following mathematically equivalent relationship:

  (6)

  Note the difference between n and N used with the summation sign ∑. In equation (5) the summation is taken over all n members of the received sequence of signs, but in (6) it is summed for the number of
symbols N in the set of available symbols.

  Explanation of the variables used in the formulas:

  n = the number of symbols in a given (long) sequence (e.g., the total number of letters in a book)

  N = number of different symbols available

  (e.g.: N = 2 for the binary symbols 0 and 1, and for the Morse code symbols and –

  N = 26 for the Latin alphabet: A, B, C, …, Z

  N = 26 x 26 = 676 for bigrams using the Latin alphabet: AA, AB, AC, …, ZZ

  N = 4 for the genetic code: A, C, G, T

  xi; i = 1 to N, sequence of the N different symbols

  Itot = information content of an entire sequence of symbols

  H = the average information content of one symbol (or of a bigram, or trigram; see Table 4); the average value of the information content of one single symbol taken over a long sequence or even over the entire language (counted for many books from various types of literature).

  Shannon’s equations (6) and (8) used to find the total (statistical!) information content of a sequence of symbols (e.g., a sentence, a chapter, or a book), consist of two essentially different parts:

  a) the factor n, which indicates that the information content is directly proportional to the number of symbols used. This is totally inadequate for describing real information. If, for example, somebody uses a spate of words without really saying anything, then Shannon would rate the information content as very large, because of the great number of letters employed. On the other hand, if someone who is an expert, expresses the actual meanings concisely, his "message" is accorded a very small information content.

  b) the variable H, expressed in equation (6) as a summation over the available set of elementary symbols. H refers to the different frequency distributions of the letters and thus describes a general characteristic of the language being used. If two languages A and B use the same alphabet (e.g., the Latin alphabet), then H will be larger for A when the letters are more evenly distributed, i.e., are closer to an equal distribution. When all symbols occur with exactly the same frequency, then H = lb N will be a maximum.

  An equal distribution is an exceptional case: We consider the case where all symbols can occur with equal probability, e.g., when zeros and ones appear with the same frequency as for random binary signals. The probability that two given symbols (e.g., G, G) appear directly one after the other, is p2; but the information content I is doubled because of the logarithmic relationship. The information content of an arbitrary long sequence of symbols (n symbols) from an available supply (e.g., the alphabet) when the probability of all symbols is identical, i.e.:

  p1= p2 = ... = pN= p, is found from equation (5) to be:

  (7)

  If all N symbols may occur with the same frequency, then the probability is p = 1/N. If this value is substituted in equation (7), we have the important equation:

  (8) Itot = n x lb N = n x H.

  2. The average information content of one single symbol in a sequence:If the symbols of a long sequence occur with differing probabilities (e.g., the sequence of letters in an English text), then we are interested in the average information content of each symbol in this sequence, or the average in the case of the language itself. In other words: What is the average information content in this case with relation to the average uncertainty of a single symbol?

  To compute the average information content per symbol Iave, we have to divide the number given by equation (6) by the number of symbols concerned:

  (9)

  When equation (9) is evaluated for the frequencies of the letters occurring in English, the values shown in Table 1 are obtained. The average information content of one letter is Iave = 4.045 77. The corresponding value for German is Iave = 4.112 95.

  The average Iave (x) which can be computed from equation (9) thus is the arithmetic mean of the all the single values I(x). The average information content of every symbol is given in Table 1 for two different symbol systems (the English and German alphabets); for the sake of simplicity i is used instead of Iave. The average information content for each symbol Iave(x) ≡ i is the same as the expectation value[22] of the information content of one symbol in a long sequence. This quantity is also known as the entropy[23] H of the source of the message or of the employed language (Iave ≡ i ≡ H). Equation (9) is a fundamental expression in Shannon’s theory. It can be interpreted in various ways:

  a) Information content of each symbol: H is the average information content Iave(x) of a symbol xi in a long sequence of n symbols. H thus is a characteristic of a language when n is large enough. Because of the different letter frequencies in various languages, H has a specific value for every language (e.g., H1 = 4.045 77 for English and for German it is 4.112 95).

  b) Expectation value of the information content of a symbol: H can also be regarded as the expectation value of the information content of a symbol arriving from a continuously transmitting source.

  c) The mean decision content per symbol: H can also be regarded as the mean decision content of a symbol. It is always possible to encode the symbols transmitted by a source of messages into a sequence of binary symbols (0 and 1). If we regard the binary code of one symbol as a binary word, then H can also be interpreted as follows (note that binary words do not necessarily have the same length): It is the average word length of the code required for the source of the messages. If, for instance, we want to encode the four letters of the genetic code for a computer investigation and the storage requirements have to be minimized, then H will be lb 4 = 2 binary positions (e.g., 00 = A, 01 = C, 10 = G, and 11 = T).

  d) The exceptional case of symbols having equal probabilities: This is an important case, namely that all N symbols of the alphabet or some other set of elements occur with the same probability p(xi) = 1/N. To find the mean information content of a single symbol, we have to divide the right side of equation (8) by n:

  (10) H ≡ Iave(x) ≡ i = lb N

  We now formulate this statement as a special theorem:

  Theorem A1: In the case of symbol sequences of equal probability (e.g., the digits generated by a random number generator) the average information content of a symbol is equal to the information content of each and every individual symbol.

  A1.2 Mathematical Description of Statistical Information

  A1.2.1 The Bit: Statistical Unit of Information

  One of the chief concerns in science and technology is to express results as far as possible in a numerical form or in a formula. Quantitative measures play an important part in these endeavors. They comprise two parts: the relevant number or magnitude, and the unit of measure. The latter is a predetermined unit of comparison (e.g., meter, second, watt) which can be used to express other similarly measurable quantities.

  The bit (abbreviated from binary digit) is the unit for measuring information content. The number of bits is the same as the number of binary symbols. In data processing systems, information is represented and processed in the form of electrical, optical, or mechanical signals. For this purpose, it is technically extremely advantageous, and therefore customary, to employ only two defined (binary) states and signals. Binary states have the property that only one of the two binary symbols can be involved at a certain moment. One state is designated as binary one (1), and the other as binary nought (0). It is also possible to have different pairs of binary symbols like 0 and L, YES and NO, TRUE and FALSE, and 12 V and 2 V. In computer technology, a bit also refers to the binary position in a machine word. The bit is also the smallest unit of information that can be represented in a digital computer. When text is entered in a computer, it is transformed into a predetermined binary code and also stored in this form. One letter usually requires 8 binary storage positions, known as a byte. The information content (= storage requirement) of a text is then described in terms of the number of bits required. Different pieces of text are thus accorded the same information content, regardless of sense and meaning. The number of bits only measures the statistical quantity of t
he information, with no regard to meaningfulness.

  Two computer examples will now illustrate the advantages (e.g., to help determine the amount of storage space) and the disadvantages (e.g., ignoring the semantic aspects) of Shannon’s definition of information:

  Example 1: Storage of biological information: The human DNA molecule (body cell) is about 79 inches (2 m) long when fully stretched and it contains approximately 6 x 109 nucleotides (the chemical letters: adenin, cytosin, guanin, and thymin). How much statistical information is this according to Shannon’s defintion? The N = 4 chemical letters, A, C, G, and T occur nearly equally frequently; their mean information content is H = lb 4 = (log 4)/(log 2) = 2 bits. The entire DNA thus has an information content of Itot = 6 x 109 nucleotides x 2 bits/nucleotide = 1.2 x 1010 bits according to equation (10). This is equal to the information contained in 750,000 typed A4 pages each containing 2,000 characters.

  Example 2: The statistical information content of the Bible: The King James Version of the English Bible consists of 3,566,480 letters and 783,137 words [D1]. When the spaces between words are also counted, then n = 3,566,480 + 783,137 - 1 = 4,349,616 symbols. The average information content of a single letter (also known as entropy) thus amounts to H = 4.046 bits (see Table 1). The total information content of the Bible is then given by Itot = 4,349,616 x 4.046 = 17.6 million bits. Since the German Bible contains more letters than the English one, its information content is then larger in terms of Shannon’s theory, although the actual contents are the same as regards their meaning. This difference is carried to extremes when we consider the Shipipo language of Peru which is made up of 147 letters (see Figure 32 and Table 2). The Shipipo Bible then contains about 5.2 (= 994/191) times as much information as the English Bible. It is clear that Shannon’s definition of information is inadequate and problematic. Even when the meaning of the contents is exactly the same (as in the case of the Bible), Shannon’s theory results in appreciable differences. Its inadequacy resides in the fact that the quantity of information only depends on the number of letters, apart from the language-specific factor H in equation (6). If meaning is considered, the unit of information should result in equal numbers in the above case, independent of the language.

 

‹ Prev