Bayesian Statistics (4th ed)

Page 32

by Peter M Lee

where .

Assume that and and that α, and are unknown parameters to be estimated.

Describe a reversible jump MCMC algorithm including discussion of the acceptance probability, to move between the four competing models:

1. ;

2. ;

3. ;

4. .

Note that if z is a random variable with probability density function f given by

then [due to P. Neal].

1. Often denoted DKL(q||p) or KL(q||p).

2. Those with a background in statistical physics sometimes refer to as the (negative) variational free energy because it can be expressed as an ‘energy’

plus the entropy

but it is not necessary to know about the reasons for this.

3. In that subsection, we wrote S where we will now write SS, we wrote where we will now write , and we wrote θ0 where we will now write .

Appendix A: Common statistical distributions

Some facts are given about various common statistical distributions. In the case of continuous distributions, the (probability) density (function) p(x) equals the derivative of the (cumulative) distribution function . In the case of discrete distributions, the (probability) density (function) p(x) equals the probability that the random variable X takes the value x.

The mean or expectation is defined by

depending on whether the random variable is discrete or continuous. The variance is defined as

depending on whether the random variable is discrete or continuous. A mode is any value for which p(x) is a maximum; most common distributions have only one mode and so are called unimodal. A median is any value m such that both

In the case of most continuous distributions, there is a unique median m and

There is a well-known empirical relationship that

or equivalently

Some theoretical grounds for this relationship based on Gram–Charlier or Edgeworth expansions can be found in Lee (1991) or Kendall, Stewart and Ord (1987, Section 2.11).

Further material can be found in Rothschild and Logothetis (1986) or Evans, Hastings and Peacock (1993), with a more detailed account in Johnson et al. (2005), Johnson et al. (1994–1995), Balakrishnan et al. (2012) and Fang, Kotz and Wang (1989).

A.1 Normal distribution

X is normal with mean θ and variance , denoted

if it has density

The mean and variance are

Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is,

If and , that is, , X is said to have a standard normal distribution

A.2 Chi-squared distribution

X has a chi-squared distribution on ν degrees of freedom, denoted

if it has the same distribution as

where are independent standard normal variates, or equivalently if it has density

If Y=X/S, where S is a constant, then Y is a chi-squared variate on ν degrees of freedom divided by S, denoted

and it has density

The mean and variance are

The mode is

and the approximate relationship between mean, mode and median implies that the median is approximately

at least for reasonably large ν, say .

A.3 Normal approximation to chi-squared

If then for large ν we have that approximately

has a standard normal distribution.

A.4 Gamma distribution

X has a (one-parameter) gamma distribution with parameter α, denoted

if it has density

This is simply another name for the distribution, we refer to as

If , then Y has a two-parameter gamma distribution with parameters α and β denoted

and it has density

so that its mean and variance are

This is simply another name for the distribution, we refer to as

If we recover the one-parameter gamma distribution; if so that the density is

we obtain another special case sometimes called the (negative) exponential distribution and denoted

The distribution function of any variable with a gamma distribution is easily found in terms of the incomplete gamma function

or in terms of Karl Pearson’s incomplete gamma function

Extensive tables can be found in Pearson (1924).

A.5 Inverse chi-squared distribution

X has an inverse chi-squared distribution on ν degrees of freedom, denoted

if , or equivalently if it has density

If , so that , then Y is S times an inverse chi-squared distribution on ν degrees of freedom, denoted

and it has density

The mean and variance are

The mode is

and the median is in the range

provided , with the upper limit approached closely when [see Novick and Jackson (1974, Section 7.5)].

A.6 Inverse chi distribution

X has an inverse chi distribution on ν degrees of freedom, denoted

if , or equivalently if it has density

If , so that , then Y is times an inverse chi distribution on ν degrees of freedom, denoted

and it has density

The mean and variance do not greatly simplify. They are

but very good approximations, at least if , are

[see Novick and Jackson (1974, Section 7.3)]. The mode is exactly

and a good approximation to the median at least if is (ibid.)

A.7 Log chi-squared distribution

X has a log chi-squared distribution on ν degrees of freedom, denoted

if X=log W where , or equivalently if X has density

(note that unlike itself this is a distribution over the whole line).

Because the logarithm of an variable differs from a log chi-squared variable simply by an additive constant, it is not necessary to consider such variables in any detail.

By considering the tth moment of a variable, it is easily shown that the moment generating function of a log chi-squared variable is

Writing

for the so-called digamma function, it follows that the mean and variance are

or (using Stirling’s approximation and its derivatives) approximately

The mode is

A.8 Student’s t distribution

X has a Student’s t distribution on ν degrees of freedom, denoted

if it has the same distribution as

where and are independent, or equivalently if X has density

It follows that if are independently and

then

The mean and variance are

Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is

As the distribution approaches the standard normal form.

It may be noted that Student’s t distribution on one degree of freedom is the standard Cauchy distribution C(0, 1).

A.9 Normal/chi-squared distribution

The ordered pair (X, Y) has a normal/chi-squared distribution if

for some S and ν and, conditional on Y,

for some μ and n. An equivalent condition is that the joint density function (for and ) is

where

If we define

then the marginal distribution of X is given by the fact that

The marginal distribution of Y is of course

Approximate methods of constructing two-dimensional highest density regions for this distribution are described in Box and Tiao (1992, Section 2.4).

A.10 Beta distribution

X has a beta distribution with parameters α and β, denoted

if it has density

where the beta function is defined by

The mean and variance are

The mode is

and the approximate relationship between mean, mode and median can be used to find an approximate median.

The distribution function of any variable with a beta distribution is easily found in terms of the inc
omplete beta function

Extensive tables can be found in Pearson (1968) or in Pearson and Hartley (1966, Table 17).

A.11 Binomial distribution

X has a binomial distribution of index n and parameter π, denoted

if it has a discrete distribution with density

The mean and variance are

Because

we see that p(X+1)> p(X) if and only if

and hence that a mode occurs at

the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.

Integration by parts shows that the distribution function is expressible in terms of the incomplete beta function, namely,

see, for example, Kendall, Stewart and Ord (1987, Section 5.7).

A.12 Poisson distribution

X has a Poisson distribution of mean λ, denoted

if it has a discrete distribution with density

The mean and variance are

Because

we see that p(X+1)> p(X) if and only if

and hence that a mode occurs at

the square brackets denoting ‘integer part of’, and this mode is unique unless λ is an integer.

Integrating by parts shows that the distribution function is expressible in terms of the incomplete gamma function, namely,

see Kendall, Stewart and Ord (1987, Section 5.9).

The Poisson distribution often occurs as the limit of the binomial as

A.13 Negative binomial distribution

X has a negative binomial distribution of index n and parameter π, denoted

if it has a discrete distribution with density

Because

we sometimes use the notation

The mean and variance are

Because

we see that p(X+1)> p(X) if and only if

and hence that a mode occurs at

the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.

It can be shown that the distribution function can be found in terms of that of the binomial distribution, or equivalently in terms of the incomplete beta function; for details see Balakrishnan et al. (1992, Chapter 5, Section 6). Just as the Poisson distribution can arise as a limit of the binomial distribution, so it can as a limit of the negative binomial, but in this case as

The particular case where n=1, so that , is sometimes referred to as the geometric distribution.

A.14 Hypergeometric distribution

X has a hypergeometric distribution of population size N, index n and parameter π, denoted

if it has a discrete distribution with density

The mean and variance are

Because

we see that p(X+1)> p(X) if and only if

and hence that if, as is usually the case, N is fairly large, if and only if

Hence, the mode occurs very close to the binomial value

As this distribution approaches the binomial distribution .

Tables of it can be found in Lieberman and Owen (1961).

A.15 Uniform distribution

X has a uniform distribution on the interval (a, b) denoted

if it has density

where

is the indicator function of the set (a, b). The mean and variance are

There is no unique mode, but the distribution is symmetrical, and hence

Sometimes we have occasion to refer to a discrete version; Y has a discrete uniform distribution on the interval [a, b] denoted

if it has a discrete distribution with density

The mean and variance are

using formulae for the sum and sum of squares of the first n natural numbers [the variance is best found by noting that the variance of UD(a, b) equals that of UD(1, n) where n=b–a+1]. Again, there is no unique mode, but the distribution is symmetrical, and hence

A.16 Pareto distribution

X has a Pareto distribution with parameters and γ, denoted

if it has density

where

is the indicator function of the set . The mean and variance are

The distribution function is

and in particular the median is

The mode, of course, occurs at

The ordered pair (Y, Z) has a bilateral bivariate Pareto distribution with parameters , η and γ, denoted

if it has joint density function

The means and variances are

and the correlation coefficient between Y and Z is

It is also sometimes useful that

The marginal distribution function of Y is

and in particular the median is

The distribution function of Z is similar, and in particular the median is

The modes, of course, occur at

The distribution is discussed in DeGroot (1970, Sections 4.11, 5.7 and 9.7).

A.17 Circular normal distribution

X has a circular normal or von Mises’ distribution with mean μ and concentration parameter κ, denoted

if it has density

where X is any angle, so and is a constant called the modified Bessel function of the first kind and order zero (besselI(κ,0) in R). It turns out that

and that asymptotically for large κ

For large κ, we have approximately

while for small κ, we have approximately

which density is sometimes referred to as a cardioid distribution. The circular normal distribution is discussed by Mardia (1972), Mardia and Jupp (2001) and Batschelet (1981).

One point related to this distribution arises in a Bayesian context in connection with the reference prior

when we have observations such that

The only sensible estimator of μ on the basis of the posterior distribution is .

The mode of the posterior distribution of κ is approximately , where

(both of which are approximately 1.87 when ) according to Schmitt (1969, Section 10.2). Because of the skewness of the distribution of κ, its posterior mean is greater than its posterior mode.

A.18 Behrens’ distribution

X is said to have Behrens’ (or Behrens–Fisher or Fisher–Behrens) distribution with degrees of freedom and and angle , denoted

if X has the same distribution as

where T1 and T2 are independent and

Equivalently, X has density

where

over the whole real line, where

This distribution naturally arises as the posterior distribution of

when we have samples of size from and of size from and neither nor is known, and conventional priors are adopted. In this case, in a fairly obvious notation

An approximation to this distribution due to Patil (1965) is as follows.

Define

Then, approximately,

Obviously b is usually not an integer, and consequently this approximation requires interpolation in the t tables.

Clearly Behrens’ distribution has mean and variance

using the mean and variance of t distributions and the independence of T1 and T2. The distribution is symmetrical and unimodal and hence the mean, mode and median are all equal, so

A.19 Snedecor’s F distribution

X has an F distribution on and degrees of freedom, denoted

if X has the same distribution as

where W1 and W2 are independent and

Equivalently, X has density

The mean and variance are

The mode is

If , then

Conversely, if then

A.20 Fisher’s z distribution

‹ Prev Next ›