by Peter M Lee
where .
Assume that and and that α, and are unknown parameters to be estimated.
Describe a reversible jump MCMC algorithm including discussion of the acceptance probability, to move between the four competing models:
1. ;
2. ;
3. ;
4. .
Note that if z is a random variable with probability density function f given by
then [due to P. Neal].
1. Often denoted DKL(q||p) or KL(q||p).
2. Those with a background in statistical physics sometimes refer to as the (negative) variational free energy because it can be expressed as an ‘energy’
plus the entropy
but it is not necessary to know about the reasons for this.
3. In that subsection, we wrote S where we will now write SS, we wrote where we will now write , and we wrote θ0 where we will now write .
Appendix A: Common statistical distributions
Some facts are given about various common statistical distributions. In the case of continuous distributions, the (probability) density (function) p(x) equals the derivative of the (cumulative) distribution function . In the case of discrete distributions, the (probability) density (function) p(x) equals the probability that the random variable X takes the value x.
The mean or expectation is defined by
depending on whether the random variable is discrete or continuous. The variance is defined as
depending on whether the random variable is discrete or continuous. A mode is any value for which p(x) is a maximum; most common distributions have only one mode and so are called unimodal. A median is any value m such that both
In the case of most continuous distributions, there is a unique median m and
There is a well-known empirical relationship that
or equivalently
Some theoretical grounds for this relationship based on Gram–Charlier or Edgeworth expansions can be found in Lee (1991) or Kendall, Stewart and Ord (1987, Section 2.11).
Further material can be found in Rothschild and Logothetis (1986) or Evans, Hastings and Peacock (1993), with a more detailed account in Johnson et al. (2005), Johnson et al. (1994–1995), Balakrishnan et al. (2012) and Fang, Kotz and Wang (1989).
A.1 Normal distribution
X is normal with mean θ and variance , denoted
if it has density
The mean and variance are
Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is,
If and , that is, , X is said to have a standard normal distribution
A.2 Chi-squared distribution
X has a chi-squared distribution on ν degrees of freedom, denoted
if it has the same distribution as
where are independent standard normal variates, or equivalently if it has density
If Y=X/S, where S is a constant, then Y is a chi-squared variate on ν degrees of freedom divided by S, denoted
and it has density
The mean and variance are
The mode is
and the approximate relationship between mean, mode and median implies that the median is approximately
at least for reasonably large ν, say .
A.3 Normal approximation to chi-squared
If then for large ν we have that approximately
has a standard normal distribution.
A.4 Gamma distribution
X has a (one-parameter) gamma distribution with parameter α, denoted
if it has density
This is simply another name for the distribution, we refer to as
If , then Y has a two-parameter gamma distribution with parameters α and β denoted
and it has density
so that its mean and variance are
This is simply another name for the distribution, we refer to as
If we recover the one-parameter gamma distribution; if so that the density is
we obtain another special case sometimes called the (negative) exponential distribution and denoted
The distribution function of any variable with a gamma distribution is easily found in terms of the incomplete gamma function
or in terms of Karl Pearson’s incomplete gamma function
Extensive tables can be found in Pearson (1924).
A.5 Inverse chi-squared distribution
X has an inverse chi-squared distribution on ν degrees of freedom, denoted
if , or equivalently if it has density
If , so that , then Y is S times an inverse chi-squared distribution on ν degrees of freedom, denoted
and it has density
The mean and variance are
The mode is
and the median is in the range
provided , with the upper limit approached closely when [see Novick and Jackson (1974, Section 7.5)].
A.6 Inverse chi distribution
X has an inverse chi distribution on ν degrees of freedom, denoted
if , or equivalently if it has density
If , so that , then Y is times an inverse chi distribution on ν degrees of freedom, denoted
and it has density
The mean and variance do not greatly simplify. They are
but very good approximations, at least if , are
[see Novick and Jackson (1974, Section 7.3)]. The mode is exactly
and a good approximation to the median at least if is (ibid.)
A.7 Log chi-squared distribution
X has a log chi-squared distribution on ν degrees of freedom, denoted
if X=log W where , or equivalently if X has density
(note that unlike itself this is a distribution over the whole line).
Because the logarithm of an variable differs from a log chi-squared variable simply by an additive constant, it is not necessary to consider such variables in any detail.
By considering the tth moment of a variable, it is easily shown that the moment generating function of a log chi-squared variable is
Writing
for the so-called digamma function, it follows that the mean and variance are
or (using Stirling’s approximation and its derivatives) approximately
The mode is
A.8 Student’s t distribution
X has a Student’s t distribution on ν degrees of freedom, denoted
if it has the same distribution as
where and are independent, or equivalently if X has density
It follows that if are independently and
then
The mean and variance are
Because the distribution is symmetrical and unimodal, the median and mode both equal the mean, that is
As the distribution approaches the standard normal form.
It may be noted that Student’s t distribution on one degree of freedom is the standard Cauchy distribution C(0, 1).
A.9 Normal/chi-squared distribution
The ordered pair (X, Y) has a normal/chi-squared distribution if
for some S and ν and, conditional on Y,
for some μ and n. An equivalent condition is that the joint density function (for and ) is
where
If we define
then the marginal distribution of X is given by the fact that
The marginal distribution of Y is of course
Approximate methods of constructing two-dimensional highest density regions for this distribution are described in Box and Tiao (1992, Section 2.4).
A.10 Beta distribution
X has a beta distribution with parameters α and β, denoted
if it has density
where the beta function is defined by
The mean and variance are
The mode is
and the approximate relationship between mean, mode and median can be used to find an approximate median.
The distribution function of any variable with a beta distribution is easily found in terms of the inc
omplete beta function
Extensive tables can be found in Pearson (1968) or in Pearson and Hartley (1966, Table 17).
A.11 Binomial distribution
X has a binomial distribution of index n and parameter π, denoted
if it has a discrete distribution with density
The mean and variance are
Because
we see that p(X+1)> p(X) if and only if
and hence that a mode occurs at
the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.
Integration by parts shows that the distribution function is expressible in terms of the incomplete beta function, namely,
see, for example, Kendall, Stewart and Ord (1987, Section 5.7).
A.12 Poisson distribution
X has a Poisson distribution of mean λ, denoted
if it has a discrete distribution with density
The mean and variance are
Because
we see that p(X+1)> p(X) if and only if
and hence that a mode occurs at
the square brackets denoting ‘integer part of’, and this mode is unique unless λ is an integer.
Integrating by parts shows that the distribution function is expressible in terms of the incomplete gamma function, namely,
see Kendall, Stewart and Ord (1987, Section 5.9).
The Poisson distribution often occurs as the limit of the binomial as
A.13 Negative binomial distribution
X has a negative binomial distribution of index n and parameter π, denoted
if it has a discrete distribution with density
Because
we sometimes use the notation
The mean and variance are
Because
we see that p(X+1)> p(X) if and only if
and hence that a mode occurs at
the square brackets denoting ‘integer part of’, and this mode is unique unless is an integer.
It can be shown that the distribution function can be found in terms of that of the binomial distribution, or equivalently in terms of the incomplete beta function; for details see Balakrishnan et al. (1992, Chapter 5, Section 6). Just as the Poisson distribution can arise as a limit of the binomial distribution, so it can as a limit of the negative binomial, but in this case as
The particular case where n=1, so that , is sometimes referred to as the geometric distribution.
A.14 Hypergeometric distribution
X has a hypergeometric distribution of population size N, index n and parameter π, denoted
if it has a discrete distribution with density
The mean and variance are
Because
we see that p(X+1)> p(X) if and only if
and hence that if, as is usually the case, N is fairly large, if and only if
Hence, the mode occurs very close to the binomial value
As this distribution approaches the binomial distribution .
Tables of it can be found in Lieberman and Owen (1961).
A.15 Uniform distribution
X has a uniform distribution on the interval (a, b) denoted
if it has density
where
is the indicator function of the set (a, b). The mean and variance are
There is no unique mode, but the distribution is symmetrical, and hence
Sometimes we have occasion to refer to a discrete version; Y has a discrete uniform distribution on the interval [a, b] denoted
if it has a discrete distribution with density
The mean and variance are
using formulae for the sum and sum of squares of the first n natural numbers [the variance is best found by noting that the variance of UD(a, b) equals that of UD(1, n) where n=b–a+1]. Again, there is no unique mode, but the distribution is symmetrical, and hence
A.16 Pareto distribution
X has a Pareto distribution with parameters and γ, denoted
if it has density
where
is the indicator function of the set . The mean and variance are
The distribution function is
and in particular the median is
The mode, of course, occurs at
The ordered pair (Y, Z) has a bilateral bivariate Pareto distribution with parameters , η and γ, denoted
if it has joint density function
The means and variances are
and the correlation coefficient between Y and Z is
It is also sometimes useful that
The marginal distribution function of Y is
and in particular the median is
The distribution function of Z is similar, and in particular the median is
The modes, of course, occur at
The distribution is discussed in DeGroot (1970, Sections 4.11, 5.7 and 9.7).
A.17 Circular normal distribution
X has a circular normal or von Mises’ distribution with mean μ and concentration parameter κ, denoted
if it has density
where X is any angle, so and is a constant called the modified Bessel function of the first kind and order zero (besselI(κ,0) in R). It turns out that
and that asymptotically for large κ
For large κ, we have approximately
while for small κ, we have approximately
which density is sometimes referred to as a cardioid distribution. The circular normal distribution is discussed by Mardia (1972), Mardia and Jupp (2001) and Batschelet (1981).
One point related to this distribution arises in a Bayesian context in connection with the reference prior
when we have observations such that
The only sensible estimator of μ on the basis of the posterior distribution is .
The mode of the posterior distribution of κ is approximately , where
(both of which are approximately 1.87 when ) according to Schmitt (1969, Section 10.2). Because of the skewness of the distribution of κ, its posterior mean is greater than its posterior mode.
A.18 Behrens’ distribution
X is said to have Behrens’ (or Behrens–Fisher or Fisher–Behrens) distribution with degrees of freedom and and angle , denoted
if X has the same distribution as
where T1 and T2 are independent and
Equivalently, X has density
where
over the whole real line, where
This distribution naturally arises as the posterior distribution of
when we have samples of size from and of size from and neither nor is known, and conventional priors are adopted. In this case, in a fairly obvious notation
An approximation to this distribution due to Patil (1965) is as follows.
Define
Then, approximately,
Obviously b is usually not an integer, and consequently this approximation requires interpolation in the t tables.
Clearly Behrens’ distribution has mean and variance
using the mean and variance of t distributions and the independence of T1 and T2. The distribution is symmetrical and unimodal and hence the mean, mode and median are all equal, so
A.19 Snedecor’s F distribution
X has an F distribution on and degrees of freedom, denoted
if X has the same distribution as
where W1 and W2 are independent and
Equivalently, X has density
The mean and variance are
The mode is
If , then
Conversely, if then
A.20 Fisher’s z distribution