Book Read Free

Bayesian Statistics (4th ed)

Page 32

by Peter M Lee


  where .

  Assume that and and that α, and are unknown parameters to be estimated.

  Describe a reversible jump MCMC algorithm including discussion of the acceptance probability, to move between the four competing models:

  1. ;

  2. ;

  3. ;

  4. .

  Note that if z is a random variable with probability density function f given by

  then [due to P. Neal].

  1. Often denoted DKL(q||p) or KL(q||p).

  2. Those with a background in statistical physics sometimes refer to as the (negative) variational free energy because it can be expressed as an ‘energy’

  plus the entropy

  but it is not necessary to know about the reasons for this.

  3. In that subsection, we wrote S where we will now write SS, we wrote where we will now write , and we wrote θ0 where we will now write .

  Appendix A: Common statistical distributions

  Some facts are given about various common statistical distributions. In the case of continuous distributions, the (probability) density (function) p(x) equals the derivative of the (cumulative) distribution function . In the case of discrete distributions, the (probability) density (function) p(x) equals the probability that the random variable X takes the value x.

  The mean or expectation is defined by

  depending on whether the random variable is discrete or continuous. The variance is defined as

  depending on whether the random variable is discrete or continuous. A mode is any value for which p(x) is a maximum; most common distributions have only one mode and so are called unimodal. A median is any value m such that both

  In the case of most continuous distributions, there is a unique median m and

  There is a well-known empirical relationship that

  or equivalently

  Some theoretical grounds for this relationship based on Gram–Charlier or Edgeworth expansions can be found in Lee (1991) or Kendall, Stewart and Ord (1987, Section 2.11).

  Further material can be found in Rothschild and Logothetis (1986) or Evans, Hastings and Peacock (1993), with a more detailed account in Johnson et al. (2005), Johnson et al. (1994–1995), Balakrishnan et al. (2012) and Fang, Kotz and Wang (1989).

  A.1 Normal distribution

  X is normal with mean θ and variance , denoted

  if it has density

  The mean and variance are

  Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is,

  If and , that is, , X is said to have a standard normal distribution

  A.2 Chi-squared distribution

  X has a chi-squared distribution on ν degrees of freedom, denoted

  if it has the same distribution as

  where are independent standard normal variates, or equivalently if it has density

  If Y=X/S, where S is a constant, then Y is a chi-squared variate on ν degrees of freedom divided by S, denoted

  and it has density

  The mean and variance are

  The mode is

  and the approximate relationship between mean, mode and median implies that the median is approximately

  at least for reasonably large ν, say .

  A.3 Normal approximation to chi-squared

  If then for large ν we have that approximately

  has a standard normal distribution.

  A.4 Gamma distribution

  X has a (one-parameter) gamma distribution with parameter α, denoted

  if it has density

  This is simply another name for the distribution, we refer to as

  If , then Y has a two-parameter gamma distribution with parameters α and β denoted

  and it has density

  so that its mean and variance are

  This is simply another name for the distribution, we refer to as

  If we recover the one-parameter gamma distribution; if so that the density is

  we obtain another special case sometimes called the (negative) exponential distribution and denoted

  The distribution function of any variable with a gamma distribution is easily found in terms of the incomplete gamma function

  or in terms of Karl Pearson’s incomplete gamma function

  Extensive tables can be found in Pearson (1924).

  A.5 Inverse chi-squared distribution

  X has an inverse chi-squared distribution on ν degrees of freedom, denoted

  if , or equivalently if it has density

  If , so that , then Y is S times an inverse chi-squared distribution on ν degrees of freedom, denoted

  and it has density

  The mean and variance are

  The mode is

  and the median is in the range

  provided , with the upper limit approached closely when [see Novick and Jackson (1974, Section 7.5)].

  A.6 Inverse chi distribution

  X has an inverse chi distribution on ν degrees of freedom, denoted

  if , or equivalently if it has density

  If , so that , then Y is times an inverse chi distribution on ν degrees of freedom, denoted

  and it has density

  The mean and variance do not greatly simplify. They are

  but very good approximations, at least if , are

  [see Novick and Jackson (1974, Section 7.3)]. The mode is exactly

  and a good approximation to the median at least if is (ibid.)

  A.7 Log chi-squared distribution

  X has a log chi-squared distribution on ν degrees of freedom, denoted

  if X=log W where , or equivalently if X has density

  (note that unlike itself this is a distribution over the whole line).

  Because the logarithm of an variable differs from a log chi-squared variable simply by an additive constant, it is not necessary to consider such variables in any detail.

  By considering the tth moment of a variable, it is easily shown that the moment generating function of a log chi-squared variable is

  Writing

  for the so-called digamma function, it follows that the mean and variance are

  or (using Stirling’s approximation and its derivatives) approximately

  The mode is

  A.8 Student’s t distribution

  X has a Student’s t distribution on ν degrees of freedom, denoted

  if it has the same distribution as

  where and are independent, or equivalently if X has density

  It follows that if are independently and

  then

  The mean and variance are

  Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is

  As the distribution approaches the standard normal form.

  It may be noted that Student’s t distribution on one degree of freedom is the standard Cauchy distribution C(0, 1).

  A.9 Normal/chi-squared distribution

  The ordered pair (X, Y) has a normal/chi-squared distribution if

  for some S and ν and, conditional on Y,

  for some μ and n. An equivalent condition is that the joint density function (for and ) is

  where

  If we define

  then the marginal distribution of X is given by the fact that

  The marginal distribution of Y is of course

  Approximate methods of constructing two-dimensional highest density regions for this distribution are described in Box and Tiao (1992, Section 2.4).

  A.10 Beta distribution

  X has a beta distribution with parameters α and β, denoted

  if it has density

  where the beta function is defined by

  The mean and variance are

  The mode is

  and the approximate relationship between mean, mode and median can be used to find an approximate median.

  The distribution function of any variable with a beta distribution is easily found in terms of the inc
omplete beta function

  Extensive tables can be found in Pearson (1968) or in Pearson and Hartley (1966, Table 17).

  A.11 Binomial distribution

  X has a binomial distribution of index n and parameter π, denoted

  if it has a discrete distribution with density

  The mean and variance are

  Because

  we see that p(X+1)> p(X) if and only if

  and hence that a mode occurs at

  the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.

  Integration by parts shows that the distribution function is expressible in terms of the incomplete beta function, namely,

  see, for example, Kendall, Stewart and Ord (1987, Section 5.7).

  A.12 Poisson distribution

  X has a Poisson distribution of mean λ, denoted

  if it has a discrete distribution with density

  The mean and variance are

  Because

  we see that p(X+1)> p(X) if and only if

  and hence that a mode occurs at

  the square brackets denoting ‘integer part of’, and this mode is unique unless λ is an integer.

  Integrating by parts shows that the distribution function is expressible in terms of the incomplete gamma function, namely,

  see Kendall, Stewart and Ord (1987, Section 5.9).

  The Poisson distribution often occurs as the limit of the binomial as

  A.13 Negative binomial distribution

  X has a negative binomial distribution of index n and parameter π, denoted

  if it has a discrete distribution with density

  Because

  we sometimes use the notation

  The mean and variance are

  Because

  we see that p(X+1)> p(X) if and only if

  and hence that a mode occurs at

  the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.

  It can be shown that the distribution function can be found in terms of that of the binomial distribution, or equivalently in terms of the incomplete beta function; for details see Balakrishnan et al. (1992, Chapter 5, Section 6). Just as the Poisson distribution can arise as a limit of the binomial distribution, so it can as a limit of the negative binomial, but in this case as

  The particular case where n=1, so that , is sometimes referred to as the geometric distribution.

  A.14 Hypergeometric distribution

  X has a hypergeometric distribution of population size N, index n and parameter π, denoted

  if it has a discrete distribution with density

  The mean and variance are

  Because

  we see that p(X+1)> p(X) if and only if

  and hence that if, as is usually the case, N is fairly large, if and only if

  Hence, the mode occurs very close to the binomial value

  As this distribution approaches the binomial distribution .

  Tables of it can be found in Lieberman and Owen (1961).

  A.15 Uniform distribution

  X has a uniform distribution on the interval (a, b) denoted

  if it has density

  where

  is the indicator function of the set (a, b). The mean and variance are

  There is no unique mode, but the distribution is symmetrical, and hence

  Sometimes we have occasion to refer to a discrete version; Y has a discrete uniform distribution on the interval [a, b] denoted

  if it has a discrete distribution with density

  The mean and variance are

  using formulae for the sum and sum of squares of the first n natural numbers [the variance is best found by noting that the variance of UD(a, b) equals that of UD(1, n) where n=b–a+1]. Again, there is no unique mode, but the distribution is symmetrical, and hence

  A.16 Pareto distribution

  X has a Pareto distribution with parameters and γ, denoted

  if it has density

  where

  is the indicator function of the set . The mean and variance are

  The distribution function is

  and in particular the median is

  The mode, of course, occurs at

  The ordered pair (Y, Z) has a bilateral bivariate Pareto distribution with parameters , η and γ, denoted

  if it has joint density function

  The means and variances are

  and the correlation coefficient between Y and Z is

  It is also sometimes useful that

  The marginal distribution function of Y is

  and in particular the median is

  The distribution function of Z is similar, and in particular the median is

  The modes, of course, occur at

  The distribution is discussed in DeGroot (1970, Sections 4.11, 5.7 and 9.7).

  A.17 Circular normal distribution

  X has a circular normal or von Mises’ distribution with mean μ and concentration parameter κ, denoted

  if it has density

  where X is any angle, so and is a constant called the modified Bessel function of the first kind and order zero (besselI(κ,0) in R). It turns out that

  and that asymptotically for large κ

  For large κ, we have approximately

  while for small κ, we have approximately

  which density is sometimes referred to as a cardioid distribution. The circular normal distribution is discussed by Mardia (1972), Mardia and Jupp (2001) and Batschelet (1981).

  One point related to this distribution arises in a Bayesian context in connection with the reference prior

  when we have observations such that

  The only sensible estimator of μ on the basis of the posterior distribution is .

  The mode of the posterior distribution of κ is approximately , where

  (both of which are approximately 1.87 when ) according to Schmitt (1969, Section 10.2). Because of the skewness of the distribution of κ, its posterior mean is greater than its posterior mode.

  A.18 Behrens’ distribution

  X is said to have Behrens’ (or Behrens–Fisher or Fisher–Behrens) distribution with degrees of freedom and and angle , denoted

  if X has the same distribution as

  where T1 and T2 are independent and

  Equivalently, X has density

  where

  over the whole real line, where

  This distribution naturally arises as the posterior distribution of

  when we have samples of size from and of size from and neither nor is known, and conventional priors are adopted. In this case, in a fairly obvious notation

  An approximation to this distribution due to Patil (1965) is as follows.

  Define

  Then, approximately,

  Obviously b is usually not an integer, and consequently this approximation requires interpolation in the t tables.

  Clearly Behrens’ distribution has mean and variance

  using the mean and variance of t distributions and the independence of T1 and T2. The distribution is symmetrical and unimodal and hence the mean, mode and median are all equal, so

  A.19 Snedecor’s F distribution

  X has an F distribution on and degrees of freedom, denoted

  if X has the same distribution as

  where W1 and W2 are independent and

  Equivalently, X has density

  The mean and variance are

  The mode is

  If , then

  Conversely, if then

  A.20 Fisher’s z distribution

 

‹ Prev