Chapter 2 The mixed model methodology

2.2.1 The fixed effects model with one independent variable: one-way ANOVA
2.2.2 The fixed effects model with two independent variables: two-way ANOVA

2.3 A general description of the mixed model

2.3.1 The mixed model description based on individual observations

2.3.2 Mixed model description by example

2.3.3 The mixed model description in matrix form

2.4 Variance component estimation and blup

2.4.1 Defining variance components

2.4.2 Estimating variance components

2.4.3 Best linear unbiased prediction of random effects

2.5 Statistical inference for fixed effects

2.5.1 Estimation of fixed effects and their variance

2.5.2 Estimation of linear combinations of fixed effects and their variance

2.5.3 Confidence intervals

2.5.4 The appropriate inference space

2.6 Hypothesis testing

2.1 Introduction

In this chapter the theoretical basis of the mixed model is explained. The fixed effects model is first introduced in Section 2.2 so that it can be compared in subsequent sections with the mixed model. The analysis of data in the fixed effects model framework is demonstrated with one independent variable (one-way ANOVA) and with two independent variables (two-way ANOVA) . Two alternative models, the 'cell means model' and the 'overparameterised model' are described. It is further explained how particular hypotheses can be tested and specific linear combinations of parameters can be estimated in the framework of the two different models. Next, the problems that may occur when using the fixed effects model for the analysis of data which also contain random effects factors are studied.

Good textbooks on the analysis of linear models, and in specific fixed effects models, include Neter et al. (1990) and Hocking (1996). A more theoretical approach, with the emphasis on the linear algebraic model formulation, is given by Searle (1971, 1987).

In Section 2.3 a general description of the mixed model is given. The different data structures of the examples included in Chapter 1 are described in the mixed model framework.

In Section 2.4 it is shown how variance components, which correspond to the variation associated with the random effects factor, can be estimated in the mixed model framework by restricted maximum likelihood (REML), and how the individual effect of a level of a random effects factor can be predicted by the method of best linear unbiased prediction (BLUP).

In Section 2.5 estimation of fixed effects parameters and linear combinations thereof is explained and the distributional assumptions of these estimates are derived and used to obtain confidence intervals.

In Section 2.6, it is shown how hypotheses of interest can be tested in the mixed model framework.

2.2 The fixed effects model

2.2.1 The fixed effects model with one independent variable: one-way ANOVA

Model description

There are two alternative formulations for the fixed effects model, the cell means model and the overparameterised model. As the statistical theory for the cell means model is easier, we explain this model first. Most statistical packages, however, use the overparameterised model, and this will be introduced after describing the cell means model.

The cell means model for a fixed effects model with one independent variable with a levels and n_kobservation per level k is given by

Y_kl= µ_k + e_kl, k = 1, ..., a; l = 1, ..., n_k

where µ_k represents the k^th individual cell mean, and the different cells represent different treatments.

The extensive way to write this model is

and

X is called the design matrix; the zero and one entries are called dummy variables.

We further require the classical ANOVA assumptions, i.e., the random error variables e_ij are independent with distribution N(0, s²⁾.

Under these assumptions we have

E (e_{i j}) = 0

Var (e_ij) = s² (homoscedasticity)

The expected value of the vector e can be rewritten in matrix notation as

E(e) = 0

This expression is a shorthand notation for

Furthermore, the variance and covariance of the vector e can be rewritten as

D(e) = s²I_N

where I_N, is the identity matrix (see Appendix C), and

This expression is a shorthand notation for

D(e) is an (N x N) matrix. E(e) is called the mean vector of the random vector e; D(e) is called the dispersion matrix or variance-covariance matrix of the random vector e.

A short way to write that the random errors eij are independent with distribution N(0,s²) is

e ~ MVN(0, s²I_N)

In words: the random vector e is multivariate normal with zero as mean vector and the diagonal matrix s²I_N as variance-covariance matrix (see Appendix D).

The overparameterised model for the same fixed effects model is given by

Y_kl= µ + t_k+ e_kl, k = 1, ..., a; l = 1, ..., n_k

This model is advantageous if the primary interest is in differences in cell means, e.g., µ_k. µ_a, k = 1, ..., a . 1, or in differences between individual cell means and a reference or baseline value.

Note that since the model is overparameterised (in the sense that the model has a+1 parameters compared with a parameters in the cell means model) each choice of the reference value leads to some constraint (e.g. t_a = 0 if µ = µ_a).

Therefore, µ can have different meanings depending on the implied restrictions. Its meaning can be made clear by defining µ in terms of the cell means.

Thus, putting µ, = µ_a we have

Example 2.2.1.1 The cell means model

We revisit Example 1.2.1. There are three different cell means, the mean PCV for animals that have been treated with a low dose (µ_l), a medium dose (µ₂) and with a high dose (µ₃₎ of the trypanocidal drug. There are 5 animals treated with the low dose (n_l= 5), 4 with the medium dose (n₂= 4) and 5 animals with the high dose (n₃= 5).

This leads to the following model description in matrix form

with Y^T = (Y_ll,Y₁₂,Y₁₃,Y₁₄,Y₁₅,Y₂₁,Y₂₂,Y₂₃,Y₂₄,Y₃₁,Y₃₂,Y₃₃,Y₃₄,Y₃₅)

Example 2.2.1.2 The overparameterised model

We can rewrite the data structure of Example 1.2.1 as an overparameterised model

where t₁ and t₂ are interpreted as the differences between the low and the high dose (µ_l. µ₃) and the medium and the high dose (µ₂ . µ₃), respectively, if the constraint t₃ = 0 is used, and t_k is equal to µ_k . µ (k = 1, 2, 3) if the constraint = 0 is used.

Parameter estimation

To obtain estimators for the terms of ß, the vector of cell means µ = (µ₁, , µ_a)^T in the cell means model, we try to solve the equation

Y=Xß

But in this form the set of N equations is insolvable since the number of parameters is a and the number of equations is N. We therefore obtain a solution based on the ordinary least squares criterion, i.e.,

obtain ß so that (Y . Xß)^T (Y . Xß) is minimal

Minimising this expression with respect to means that we estimate the cell means so that the sum of squared differences between the observations and the cell means is minimal; in other words, estimators for the cell means are on average as close (in the least squares sense) to the observations as possible.

One can show that minimisation of least squares in this way is equivalent to solving the set of equations

X^TXß = X^TY

This set of equations is called the set of normal equations. Now we have a equations and a parameters.

Example 2.2.1.3 Estimating parameters in the cell means model

For Example 1.2.1, we have

Note that rank (X)=3 (full rank), and that X^TX is also of full rank. The inverse of X^TX is

Thus, the cell means µk (k = 1, 2, 3) areestimated by the means _K of the observations Y_kl(l = 1, ..., n_k).

Parameter estimation in the overparameterised model is more complex. This is due to the fact that the (N x (a + 1)) design matrix X has rank a. Therefore, the design matrix is not of full rank. This is a direct consequence of the fact that the model is overparameterised. Therefore there is not a unique solution for the inverse of X^TX and hence not a unique solution for the set of normal equations.

One solution is to impose a constraint on the parameters e.g. and reduce the matrix to full rank by reduce the matrix to full rank by substituting for t_a=. t₁ . t₂ - ... . T_a. 1, say. This method is often applied in statistical programs. However, here we will consider the general case of using a generalised inverse of X^TX.

This is a matrix G that satisfies

X^TXGX^TX = X^TX

The general notation for the generalised inverse of a matrix A is A^.. However, as a shorthand notation we use G to denote the generalised inverse of X^TX. Such a generalised inverse is not unique. For each G we obtain a different solution of the set of normal equations. This solution takes the form

ß= GX^T Y

There are different ways of obtaining a generalised inverse as is now shown by example.

Example 2.2.1.4 Estimating parameters in the overparameterised model

The matrix product X^TX for the data set of Example 1.2.1 is given by

X^TX is a (4 x 4) matrix and rank (X^TX) = 3. Therefore the simple inverse of X^TX does not exist and there is not a unique solution for the set of normal equations.

We discuss two ways to obtain a generalised inverse.

Method 1:

Assume that A is a square matrix of dimension (r x r) and that rank (A)=a (a < r). Partition A as

so that A₁₁ is an (a x a) matrix of full rank.

Then one solution for A^- in the equation AA^-A = A is given by

Note that each 0 in the expression for A- is a matrix.

as one possible solution of the set of normal equations. This corresponds to the situation with µ = µ₃ and t₃ = 0.

Method 2:

Partition A as in Method 1 but in such a way that A_22, not A₁₁, is an (a x a) matrix of full rank.

Then a solution of

This solution corresponds to that of the cell means model with parameters µ_l, to µ₂ and µ_3.

Estimation of linear combinations of parameters

In many applications interest is not in the parameters, ß_is themselves but in linear combinations of these parameters. A linear combination is defined as

An estimate of a linear combination L can easily be obtained once parameter estimates are available as

In the case of the cell means model estimation of any linear combination is straightforward.

Assume, for instance, in our example, that we want to estimate the difference between the low dose and the average of the medium and high doses. The vector product C^Tß is then given by

An estimate for this linear combination is given by

In the case of the overparameterised model, however, we have seen that different solutions for ß can be obtained depending on the choice of the generalised inverse. In general we have that each solution gives a different value for . Linear combinations for which is invariant with respect to the choice of the generalised inverse are called estimable functions. Based on our example we further explain what we mean by estimable functions.

Example 2.2.1.5 Estimable functions

Consider the one-way overparameterised model

Y=Xß + e

with ß^T= (µ,T₁,T₂,T₃) and

and that X^TX is a (4 x 4)-matrix and rank (X^TX)=3. With G a generalised inverse of X^TX we have that

_G=GX^TY

is a solution of the set of normal equations

X^TXß = X^TY

It is easy to show, for example, that G₁ and G₂ are generalised inverses for X^TX, where

From this example it is clear that 'estimators' for µ, t_l, t₂ and t₃ depend on what has been chosen as the generalised inverse. The following table gives for five choices of G (not given explicitly) the actual values for the estimators of the parameters and of certain linear functions of the parameters.

Table 2.2.1.1 Solutions for parameters and linear combinations of parameters based on different generalised inverses


	Solution
Parameter	1	2	3	4	5
µ	0	4.46	2.0	7.6	150
ti	2.0	-2.46	0	-5.6	-148.0
t2	3.8	-0.66	1.8	-3.8	-146.2
t3	7.6	3.14	5.6	0	-142.4
Linear combination
t₁ + t₂	5.8	-3.12	1.8	- 9.4	- 294.2
(t₁ + t₂+ t₃) /3	4.46	0.02	2.46	- 3.13	- 145.53
t₁ . t₂	-1.8	-1.8	-1.8	- 1.8	- 1.8
µ + t₁	2.0	2.0	2.0	2.0	2.0
µ + t₃	7.6	7.6	7.6	7.6	7.6

We see that the actual values of the estimators of the parameters change with the choice of G, as we have already discussed before, but we also see that the value for some linear combinations of µ, t_l, t₂, t₃ is the same whatever G is chosen. Such linear combinations are called estimable functions. It turns out that linear combinations of parameters that are relevant in practice (e.g. t₁ . t₂, µ + t₁, µ + t₂) are estimable. Furthermore, estimable functions can always be written as a function of the cell means. For instance, if the restriction equals t₃ = 0, then it follows that µ_k = µ + t_k, and thus, as an example, t₁ - t₂ = µ₁ - µ₂. On the other hand, no such relationship exists for t_l + t₂.

Variance of linear combinations of

In the previous subsections we studied the ordinary least squares estimator of Inference on also needs an explicit expression for the variance-covariance matrix of ß. The variance-covariance matrix of in the cells means model is given by

D() = D ((X^TX)^-1X^TY)

= (X^TX^)-1X^TD(Y)X(X^TX)^-1

= (X^TX^)-1X^Ts2I_NX(X^TX)^-1

= s²(X^TX)^-1X^TX(X^TX)^-1

s²(X^TX)^-1

whereas in the overparameterised model it is given by

D() = D (GX^TY)

= GX^TD(Y)XG^T

= s²GX^TXG^T

Note that the latter expression is not invariant for G. This means that the variance-covariance matrix of depends on the particular choice of the generalised inverse. In the previous section we defined estimable parameters as those linear combinations of ß (i.e. L = c^Tß,) for which the corresponding estimator = C^T is invariant with respect to the choice of G.

One can show that c^Tß is estimable if and only if c^Tß = t^TX where t is a N x 1 vector with each element equal to some real number. Based on this fact it can be shown (as described in Appendix E) that the variance of = c^T is invariant to the choice of G and given by

Var(c^T,) = s²t^TXGX^Tt

For the cell means model we simply obtain

Var(c^T) = s²c^T (X^TX)^-1 c

The variance-covariance expression requires estimation of s². The least squares estimator is based on the sum of squared differences between the observations and estimated cell means µ_k = k. given by

It can easily be shown that this is an unbiased estimator of s², whereas the log-likelihood estimator given by

is a biased estimator of s². We will come back to this issue in more detail in Section 2.4.

The value of s² can then be used in the estimation of var (c^T) and the (1- a )100% confidence interval for c^Tß is given by

Note that the t-distribution with N . a degrees of freedom is used here, and not the standard normal distribution, because an estimate s² of s² was used to calculate the variance of the estimated parameter, which introduces additional uncertainty, taken care of by the t-distribution.

Example 2.2.1.6 Confidence intervals

If the cell means model for the data set of Example 1.2.1 is used, the variance-covariance matrix of is given by

Therefore, the variance of the estimate , for instance, is given by

An unbiased estimator for s² is given by

and an estimate of the variance of the estimate is

A 95% confidence interval for µ₁is then given by

We can also use the overparameterised model to find the same estimates and confidence intervals.

Now the vector of fixed effects is given by

ßT = (µ t₁ t₂ t₃)

and suppose the linear combination of interest is given by

c^Tß = (1 1 0 0 )ß

Assume that the G_l generalised inverse is used to find a solution for ß

The variance-covariance matrix of _G1 can be estimated by

(_G1) = s²G₁X^TXG

Both c^T_Gl and the estimated variance (c^T_G1) are thus the same as for the cell means model. This is due to the fact that c^Tß is an estimable function and both its estimate and its estimated variance are independent of the particular choice of the generalised inverse.

To show that c^Tß is estimable, we need to show that c^Tß can be written as t^TX:

Hypothesis testing

The aim of hypotheses testing is to determine, based on the available observations, whether or not a hypothesised relation between parameters is plausible. The hypothesised relation is called the null hypothesis (H₀). The alternative hypothesis (H_a) is that the claim is false. For one-way ANOVA it is customary to start by determining whether or not the cell means µ_l,.., µ_a are equal. This hypothesised relation leads to the null hypothesis

H₀ :µ₁ = ... = µ_a

We test this hypothesis against the alternative

H_a : not all µ_k equal (k = 1, ..., a)

An example of a more specific hypothesis involving a function of the parameters might be

H₀: µ₁ = (µ₂ + µ₃) /2

H_a : µ₁ ? (µ₂+ µ₃) /2

In order to construct the test statistic to be used for choosing between the alternatives H₀ : µ_l= ... = µ_a and H_a: not all µ_k equal, the total sum of squares of all the observations in the data set is decomposed into different terms in the following table.

Table 2.2.1.2 Decomposition of total sum of squares

The total number of degrees of freedom is equal to the total number of observations, and can be subdivided in a similar way as the sum of squares. The among treatments sum of squares (SSTR) has a . 1 degrees of freedom as there are a estimated deviations_k. – but one degree of freedom is lost because the deviations are not independent since

The error sum of squares (SSE) has N – a degrees of freedom as for each treatment there are n_k – 1 independent deviations in the sum and, having a such levels, this gives

The mean squares can be obtained by dividing their respective sums of squares by their associated degrees of freedom:

MSTR = SSTR/(a . 1) and MSE = SSE/(N . a)

The first important point to note is that the expected values of MSTR and MSE are both equal to ^s2 under the null hypothesis, whereas the expected value of MSTR is greater under the alternative hypothesis.

Indeed the expected values can be shown to be as follows (see Appendix G)

E(MSE) = s²

Thus, a test can be based on the comparison between these two mean squares.

Another important point to note is that SSTR and SSE divided by s² both follow a Chi-squared distribution under the null hypothesis.

Thus, under the null hypothesis

Owing to the fact that SSE/s² and SSTR/s² are Chi-squared distributed and independent from each other, the ratio

has a F-distribution with a . 1 and N . a degrees of freedom.

Thus,

We can therefore use the F-statistic to test the general hypothesis.

More specific hypotheses can be tested in the same way if an appropriate sum of squares can be found for which the expected value is equal to E(MSE) under the null hypothesis, and is greater than E(MSE) under the alternative hypothesis.

In this way all possible linear combinations of parameters can be tested.

Example 2.2.1.7 Testing hypotheses

The total sum of squares and the degrees of freedom of the observations of the data set shown in Example 1.2.1 can be divided as shown in the following table.

Table 2.2.1.3 Decomposition of total sum of squares for Example 1.2.1

Source of variation	Sum of squares	df	Mean square
Mean	289.25	1	289.25
Among doses	80.73	2	40.36
Error	30.91	11	2.81
Total	400.86	14

The value of the F-statistic is

F = = 14.36

and can be used to test the general hypothesis that the three doses give the same mean response in PCV.

Under this null hypothesis the F-statistic is distributed as

F ~ F_2,11

Therefore, the P-value is given by

Prob (F_2,11 14.36) = 0.0009

So even at the 0.1% significance level the null hypothesis can be rejected.

Now consider the more specific hypothesis testing problem

H₀ : µ₁ = 0.5(µ₂ + µ₃)

H_a : µ1 0.5(µ₂+ µ₃)

The distributional assumptions of the estimate of µ₁ - 0.5(µ₂ + µ₃) can be used to derive the appropriate F-statistic. An estimate for µ₁ - 0.5(µ₂+ µ₃) is given by _l.-0.5(₂.+₃.).

Under the null hypothesis, it has been shown that

As the F-distribution can be obtained by squaring the t-distribution, the following distributional property holds

Thus the null hypothesis is rejected (P< 0.01).

The decomposition of the sum of squares and degrees of freedom and application of the statistical tests can easily be obtained from the GLM procedure in SAS. The program that generates the output discussed above is shown in Appendix B.2.1.

The following table is extracted from the generated output

Dependent Variable: PCV


Source	df	Sum of Squares	Mean Square	F value	Pr > F
Model	2	80.73	40.38	14.37	0.0009
Error	11	30.91	2.81
Corrected Total	13	111.63
Source
DOSE	2	80.73	40.36	15.17	0.0009
Contrast
1 vs mh	1	43.39	43.39	15.44	0.0024
Parameter		Estimate	T for H0: Parameter=0	Pr > /t/	Standard error of estimate
1 vs mh		. 3.68	. 3.93	0.0024	0.937

The corrected total sum of squares is the total sum of squares minus the sum of squares for the mean described earlier, and represents the sum of squares of deviations of observations about the overall mean

2.2.2 The fixed effects model with two independent variables: two-way ANOVA

The description of one-way ANOVA models extends easily to two-way and multiway ANOVA.

Instead of one independent variable, the model now contains two independent variables, A and B, each having a number of levels a and b respectively.

The cell means model can be written as

Y_ijk = µ_ij + e_ijk i= 1, .. , a; j = 1, .. , b; k = 1, .. , n_ij

In practice, one can work again with the overparameterised model

Y_ijk = µ + a_i +ß_j +(aß)_ij + e_ijk i = 1,, a;j = 1,, b;k = 1, , n_ij

with

µ the overall mean (or in general the reference or baseline value)

a_iparameters for levels of A

ß_j parameters for levels of B

(aß)_ijparameters describing interactions between levels of A and B

e_ijk independent homoscedastic normal errors (with zero mean).

Example 2.2.2.1 Two-way ANOVA model specification

Assuming that there are two observations for each treatment combination (n_ij = 2 for each i and j) and that each of the two independent variables has two levels, the cell means model in matrix notation is given by

X is the design matrix, rank(X)= 4. The dimension of X is (N x 4), with N=8 the total number of observations.

The design matrix X in the overparameterised model becomes

and ß^T= (µ a ₁ a ₂ b ₁ b ₂ (ab )₁₁ (ab )₁₂ (ab )₂₁ (ab )₂₂)

The vectors Y and a are the same as before.

Note that the dimension of X is (8 x 9) but rank(X)= 4. So X is not a full rank matrix.

2.3 A general description of the mixed model

Mixed models and fixed effects models study the relationship between a response and one or more independent variables or factors. For a fixed effects factor we assume there to be only a finite set of levels that can be represented and that inferences are to be made only concerning the levels defined for the study. For a random effects factor, however, we assume that there is an 'infinite' set of levels (a population of levels) and we think about the levels present in the study as being a sample from that population. In Section 2.3.1 it is explained how a data set can be described in the mixed model framework by using a notation for each separate observation. All the examples introduced in Chapter 1 are revisited in Section 2.3.2 and statistical models axe defined in the mixed model notation. In Section 2.3.3 a general matrix form for the mixed model is introduced and the matrix form of some of the examples is discussed in detail.

2.3.1 The mixed model description based on individual observations

We introduced in Section 2.2.1 the following overparameterised fixed effects model

Y_kl = µ + t_k+ e_kl, k = 1, ..., a; l = 1, ..., n_k

where µ and the t_k s are unknown parameters.

Assume now that we have again only one factor, but this time it is a random effects factor. The model description is similar

Y_kl= µ + t_k+ e_kl, k = 1,...,a;l = 1, ...,n_k

but t₁,..., t_a are considered to be a random sample from a normally distributed population with mean zero and variance s , i.e. t_k ~ N(0, s) and t_l, ..., t_a are independent. Furthermore it is assumed that every e_klis independent from every t_k.

This model is called a random effects model; it only contains an overall mean parameter µ as a fixed effect. Thus a model with only one factor in addition to the mean can be either a fixed effects model or a random effects model.

When there are two or more factors, one may be a random effects and the other a fixed effects factor; the model is then called a mixed model. We now revisit the examples discussed in Chapter 1 and for each of them give a model description.

2.3.2 Mixed model description by example

Comparing three doses of a drug: a completely randomised design

The data structure of Example 1.2.1 can be described by the following statistical model

Y_kl= µ + t_{k +}e_kl

where, for k = 1, 2, 3; l = 1, ..., n_k,

Y_kl is the difference in PCV (=PCV_after- PCV_before)

µ is the overall mean

t_k is the effect of the low (t₁), medium (t₂) or high (t₃) dose of Berenil

e_kl is the random error of animal l given dose k

Both µ and t_kare fixed effects. For the random effects e_kl we assume independence and

e_kl ~ N(0, s²)

In words: the random error e_kl describes the deviation of the observation Y_kl from the expected value µ_k = µ + t_k; these errors are assumed to be independent and normally distributed with mean 0 and variance s². The model is a fixed effects model.

Comparing two doses of a drug: a randomised block design

The data structure of Example 1.2.2 can be described by the following statistical model

Y_ikl= µ + h_i+ t_k + e_ikl

where, for i =1, 2, 3; k =1, 2; l =1, ..., n_ik,

Y_ikl is the difference in PCV (=PCV_after - PCV_before)

µ is the overall mean

h_iis the effect of herd i

t_kis the effect of the high (t₁) or low (t₂) dose of Berenil

e_ikl is the random error of animal l given dose k in herd i

Both µ and t_k are fixed effects. For h_i, however, we have a choice; we can either assume it to be a fixed effect so that inference can apply only to the herds in question; alternatively, we can assume that the herds have been selected from a population of herds. In this example we shall assume the latter. Thus, both h_i and e_ikl are random effects.

The assumptions are

the h_i are iid N(0,s)

the e_iklare iid N(0,s²)

the h_i and e_ikl are independent

Thus, the random effect h_i describes the effect of a particular herd on the response variable. The effect of a particular herd is assumed to belong to a normal distribution with mean zero and variance s. In this example interest is thus not in the particular herds, but rather that conclusions should be valid for the whole population.

Comparing two drugs: a completely randomised design with repeated observations

The data structure of Example 1.2.3 can be described by the following statistical model

Y_ijl = µ + d_i + h_ij+ e_ijl

where, for i = 1, 2; j = 1, 2; 1 = 1, ..., n_ij,

Y_ijlis the difference in PCV (=PCV_after - PCV_before)

µ is the overall mean

d_iis the effect of the Berenil (d₁) or Samorin (d₂)

h_ij is the effect of herd j which receives drug i

e_ijl is the random error of animal l in herd j to which drug i has been assigned.

Both µ and d_i,are fixed effects, whereas h_ij and e_ijl are random effects; the assumptions are

the h_ij are iid N(0,s)

the e_ijkll are iid N(0,s²)

the h_ijand e_ijkl are independent

Thus the random effect h_ij describes the effect of a particular herd on the response variable, and the selection of particular herds is assumed to be a random sample from a normal distribution with variance s. Although the model is similar to the model of the previous example, there is a substantial difference. In this case the treatment is assigned to animals in a herd as a whole. For this reason we use the notation h_ij, which means that the herd factor is 'nested' in the drug factor, so that each animal in a herd nested in drug i receives drug i.

Comparing two drugs at two different doses: two sizes of experimental units

The data structure of Example 1.2.4 can be described by the following statistical model

Y_ikl = µ + d_i + h_ij+t_k +(dt)_ik+e_ijkl

where, for i = 1, 2; j = 1, 2; l = 1, ..., n_ij,

Y_ijkl is the difference in PCV (=PCV_after- PCV_before)

µ is the overall mean

d_i is the effect of the Berenil (d₁) or Samorin (d₂)

h_ij is the effect of herd j which receives drug i

_t_k_{is the effect of the
high}t₁ or low _t₂_dose

(d_{t)ik is the interaction
effect between drug and dose}

e_ijkl is the random error of animal l at dose k in herd j to which drug i has been assigned

Both µ,d_i T_k,(d_T)ik are fixed effects, whereas h_ij and e_ijl are random effects.

The assumptions are

the h_ij are iid N (0, s)

the e_ijkl are iid N (0, s²)

the h_ijand e_ijl are independent

Thus, the random effect h_ij describes again the effect of a particular herd on the response variable.

Comparing two drugs at two different doses: a split plot design

The data structure of Example 1.2.5 can be described by the following statistical model

Y_ijk = µ + d_i+ r_j + h_ij+ T_k+ (_dT)_ik + e_ijk

where, for i = 1, 2; j = 1, ..., 6; k = 1, 2,

Y_ijkis the difference in PCV (=PCV_after- PCV_before)

µ is the overall mean

d_i is the effect of the Berenil (d₁) or Samorin (d₂)

r_jis the random effect of region j

h_ijis the random effect of the herd within region j to which level i of the drug has been assigned

T_k is the effect of the high (T_i) or low (T₂) dose

(d_T)ik is the interaction effect between drug and dose

e_ijk is the random error of the average response of animals from region j to which the ik drug-dose treatment has been applied

Both µ., d_i, T_k and (d_T)_ik are fixed effects, whereas r_j, h_ij and e_ijk are random effects; the assumptions are

the r_j are iid N(0, s )

the h_ij are iid N(0, s)

the e_ijk are iid N(0,s²)

the r_j, h_ij and e_ijk are independent

Thus the random effect r_j describes the effect of a particular region on the response variable. The random effect h_ij describes the effect of herd within a region on the response variable.

Variation in ELISA data

The data structure of Example 1.2.6 can be described by the following statistical model

Y_ijk= µ + a_i+ t_j+ e_ijk

where, for i =1, ..., 4; j = 1, ...,11; k = 1, 2,

Y_ijk is the OD value

µ is the overall mean

a_iis the effect of animal i

t_j is the effect of time j

e_ijk is the random error of measurement k at time j on animal i

In this case a_iis assumed to be a random effect; provided there is no systematic pattern of changes in OD with time, we can also assume that t_j is a random effect. Thus, for the random effects a_i, t_j and e_ijk we assume

the a_i are iid N(0, s)

the t_iare iid N(0, s)

the e_ijk are iid N(0,s ²)

the a_i_, t_i and e_ijk are independent

Since the overall mean is the only fixed effect, this is a random effects model. The interest of the investigator is in the magnitudes of the different sources of variability on the OD value.

Helminth-resistance in sheep

The data structure of Example 1.2.7 can be described by the following statistical model

Y_ij = µ + s_i+ e_ij

where, for i = 1, ..., 8; j = 1, ..., n_ii,

Y_ij is the weaning weight

µ is the overall mean

s_i is the effect of the sire i

e_ij is the random error of lamb j of sire i

In this case, there is only one fixed effect, the overall mean µ. The other effects s_i and e_ijare random effects.

The assumptions are

the s_i are iid N(0, s )

the e_ijare iid N(0,s²)

the s_iand e_ijare independent

The model is a random effects model. For this example, interest is in the sources of variability as well as in actual responses of the specific sires. It will be shown later how these two goals can be met together.

Breed differences in susceptibility to trypanosomosis

The data structure of Example 1.2.8 can be described by the following statistical model

Y_ikl = µ + ,ß_i0 +ß_il ^t + a_ik + e_ikl

where, for i = 1, 2; k = 1, ..., 6; l = 1, ...14,

Y_iklis the PCV

µ is the overall mean

ß_io is the intercept of the regression line for breed i

ß_il is the slope of the regression line for breed i

a_ikis the effect of animal k belonging to breed i

e_ikl is the random error of measurement l on animal k belonging to the breed i

There are thus five fixed effects, the overall mean and the intercepts and the slopes for the two breeds. Both a_ik and e_ikl are random effects. The assumptions for the random effects are

the a_ik are iid N(0,s)

the e_ikl are iid N(0,s²)

the a_ik and e_ikl are independent

2.3.3 The mixed model description in matrix form

The mixed model examples, given in Section 2.3.2, can also be written in matrix notation. The matrix form of a mixed model collects the fixed effects in a vector , and the random effects in a vector u, and finally the random error term, which is also a random effects factor, in the vector e. A formal definition is

Y = Xß+Zu + e

with X the known design matrix for the fixed effects and Z the known design matrix for the random effects.

Example 2.3.3.1

The matrix representation for Example 1.2.2 is as follows

The example contains the mean, the dose factor which is a fixed effects factor at two levels, the herd factor which is a random effects factor at 3 levels and finally the random error term. The overall mean µ and the two fixed dose effects, t_land t₂, are contained in the ß vector, whereas the three random herd effects h_l, h₂ and h₃ are contained in the u vector.

Example 2.3.3.2

Example 1.2.4 contains the dose factor and the drug factor which are both fixed effects factor at two levels, the herd factor which is a random effects factor at 4 levels and finally the random error term. The overall mean µ., the two fixed dose effects, T_l and T₂, the two fixed drug effects, d₁ and d₂ and the interaction effects (d_T)₁₁, (d_T)₁₂, (d_T)₂₁and (d_T)₂₂are contained in the ß vector, whereas the four random herd effects h₁₁, h₁₂, h₂₁ and h₂₂ are contained in the u vector.

This data structure can again be put in matrix form Y = Xß + Zu + e. The different parts are shown on the two next pages.

The X matrix can be divided in four parts, the first part is the vector Xµ, consisting of ls. The second part is the X_dmatrix containing the two vectors corresponding to d₁ and d₂. The third part is the X_T matrix containing the two vectors corresponding to T_l and T₂. The last part is the X_y matrix containing the four vectors corresponding to (d_T)₁₁, (d_T)₁₂, (d_T)₂₁and (d_T)₂₂.

2.4 Variance component estimation and blup

2.4.1 Defining variance components

It has been demonstrated how different variance components describe at different levels the variability in observations with a mixed model structure. The variance-covariance matrix of Y where

Y = Xß + Zu + e

is obtained as follows.

The first part of the model, Xß , does not contribute to the variance of Y as it only contains fixed effects. Furthermore, the assumptions are made that each element u₁ in the vector u is a random effect which comes from a normal distribution with mean 0 and variance s, and that the elements are independent from each other and that the covariances among the elements of u are thus zero. The variance-covariance matrix for u is therefore given by a diagonal matrix D(u). All the elements in the vector u are assumed to be independent from the elements in e. We further have that the elements in e are also normally distributed with mean 0 and variance s² and are independent from each other. Given these assumptions, the variance-covariance matrix of Y is then given by

D(Y) = D(Zu) + D(e) = ZD(u)Z^T+ s² I_N

Example 2.4.1.1

We revisit Example 1.2.5. The vector u consists of 18 random effects, 6 random effects for the regions (r_l.... r₆), 12 random effects for the herds (h₁₁, h₂₁, h₁₂, h₂₂, ..., h₁₆, h₂₆) whereas the Z matrix is a (24 x 18) matrix.

Note that the matrix product can also be written as the sum of two different matrix products. The first 6 entries of the u vector and the first 6 columns of the Z matrix can be defined as u_r and Z_r respectively and the remaining entries of a and Z as u_h and Z_h, respectively. Thus, u_r and Z_rcontain all the information with regard to the regions, whereas u_h and Z_h contain all the information with regard to the herds within regions. Thus

For the vector of observations sorted by region, given by

Y^T= (Y₁₁₁ , Y₁₁₂ , Y₂₁₁, Y ₂₁₂, ..., Y₁₆₁, Y₁₆₂, Y₂₆₁, Y₂₆₂), the matrix product Zu is shown below.

As it is assumed that u_r and u_h are independent from each other, the variant of Zu can be obtained as

D(Zu) = D(Z_ru_r) + D(Z_hu_h) = Z_rD(u_r)Z + Z_hD(u_h)Z

Given that the different entries in u_r are independent from each other, the variance-covariance matrix of u_r. is given by

In a similar way, the variance-covariance matrix of u_h can be obtained as

D(u_h) = sI₁₂

Therefore, the variance-covariance matrix of Y is given by

Both Z_r and Z_h can be written in a simpler form as (Appendix C)

Define Y_jas the vector containing the four observations from region j, Y = (Y_ljl,Y_lj2,Y_2j1,Y_2j2)

For D(Yj) = W, the variance-covariance matrix of Yj, we have

Thus, the variance of an observation corresponds to s²+s+s, the covariance between two observations from the same herd equals s +sand finally, the covariance between two observations from the same region, but from different herds, corresponds to s.

The full variance-covariance matrix is then equal to

Thus, the different variance components prevail in the variance-covariance matrix of the vector Y. Actually, the only variable elements in the matrix V are the different variance components. In order to derive an estimate of the matrix V, estimates of the variance components axe needed for insertion into V. In the next subsection it is discussed how estimates of variance components can be obtained.

2.4.2 Estimating variance components

There exist different methods to estimate variance components. Only likelihood based estimators are considered here (Harville, 1977). An excellent monograph on likelihood is written by Edwards (1972). Estimation methods for variance components based on other criteria are well described by Searle et al. (1992).

We assume the following distributional assumptions for the observation vector Y

Y ~ MVN(Xß, V)

The likelihood of Y is then given by

where ? V? means the determinant of the matrix V (see Appendix C).

The only unknowns in V are the variance components s, ..., s, with d the number of variance components.

Thus the log-likelihood can be rewritten as

l_Y(, V) = l_Y (, s, ..., s)

Maximum likelihood estimators can then be obtained by maximising this expression with regard to ß and s, , s Differentiating the log-likelihood with respect to the different parameters and setting the result equal to 0 leads to the maximum likelihood estimator for each parameter.

Differentiating the log-likelihood with respect to ß gives

Thus a solution for ß can be obtained as a function of the unknown variance components. This solution can then be inserted into the remaining differential equations in order to lead to a solution for the variance components. In some cases, however, the value that maximises the log-likelihood turns out to be negative (Searle et al., 1992). Although sometimes a negative variance component makes sense (Henderson, 1953; Nelder, 1954), e.g., if the variance component corresponds to a correlation (see Example 3.3), in most cases the parameter space for the variance components is restricted to the interval containing only positive numbers and zero. If a variance component becomes negative, it is usually set to zero and it can be shown (Searle et al., 1992) that, within the restricted region, the zero estimate corresponds to the maximum likelihood estimate. SAS does this by default (SAS Institute Inc. Cary, NC, 1996; Littell et al., 1996). In what follows, the term maximum likelihood estimators is used for the maximum likelihood estimators in the restricted parameter space.

The maximum likelihood estimators of the variance components are biased. The estimate will on average be smaller than the population value. The reason for this is that in the estimation procedure the existence of fixed effects is ignored, and the degrees of freedom used in deriving the estimators do not adjust for this.

The most straightforward example is the estimator of the variance of a sample Y^T = (Y₁,...,Y_N) of a univariate normal distribution N(µ,s ²). The unbiased estimator of s² is given by

If, however, the maximum likelihood criterion is used, the estimator of s² is given by

Thus, this estimator underestimates s². For the unbiased estimator, one degree of freedom is subtracted from N to account for the one degree of freedom used in calculating Y to estimate µ.

Patterson and Thompson (1971) developed a method to derive unbiased estimates of variance components based on the maximum likelihood principle; it is called restricted maximum likelihood estimation. The restricted maximum likelihood method is based on the likelihood of a vector whose components are independent linear combinations of the observations. The basic idea is to end up with a random vector that contains all the information on the variance components but no longer contains information on the fixed effects parameters. We now give a technical description of this method.

Since X is the design matrix for the overparameterised model, it is not of full rank. The rank of X is the number of cell means in the model (the total degrees of freedom associated with the fixed effects). We call this rank r, i.e., rank(X)=r.

A set of linear combinations, denoted as KY, is derived, where K is a (N. r, N) matrix of full rank (see Appendix C)

and which obeys the following equation

E(KY) = 0

As E(KY) = KXß, it follows that for each individual linear combination

k_jX = 0

The individual linear combinations kjY axe called error contrasts (Harville, 1974; Harville, 1977).

As Y is assumed to be normally distributed MYN(X,V), it follows that

KY ~ MVN(0, KVK^T)

The restricted maximum likelihood criterion is based on the likelihood of KY. This likelihood does not contain fixed effects and furthermore contains fewer terms (N . r) as compared to the likelihood of the observation vector Y. Maximising the likelihood of KY will yield the restricted maximum likelihood estimators for the variance components.

Example 2.4.2.1

Assume that a sample is taken from a large herd to obtain an estimate of the average body weight of animals in that herd. Apart from the estimate of mean body weight also its variability is of interest. The weight of an animal can be assumed to be normally distributed N(µ,s²). Denote the sample as Y = (Y₁, ..., Y_N)^T; the observations are assumed to be independent. Therefore the variance covariance matrix of Y is

V =s²I_N

and

Y = MVN(µ1_N,s²I_N)

The log-likelihood of Y is given by

Differentiating this expression first with respect to µ, we obtain

and setting it equal to 0 gives

(Y . µ1_N)1_N = 0

leading to

Differentiating the log-likelihood of Y with regard to s², we obtain

When using the restricted maximum likelihood criterion, a matrix K must be found that satisfies the aforementioned specifics. One such K (of dimension (N . 1, N)) is given by

K = (I_{N .
1}. N^{. 1}J_{N. 1}.N ^{. 1}1_{N .
1})

It can easily be shown that the (N - 1) rows are linearly independent, and that each row satisfies k_j X = 0 and therefore

Differentiating this log-likelihood expression with regard to s² and setting it equal to 0 gives

It can be shown (Searle et al., 1992) that, in general,

K^T(KK^T)^{. 1} K = I_N . X(X^TX)^{. 1}X^T

for any appropriate choice of K.

In our example for body weight, the X matrix consists simply of a vector of 1's of length N, 1_N.

Thus

X^T X = N and (X^TX)^{. 1} = 1/N

and furthermore

XX^T = J_N

Therefore

K^T(KK^T) ^{. 1} K = I_N .

Substituting in the formula, for s² gives

Example 2.4.2.2

We revisit Example 1.2.5. There are three different variance components. There is a first variance component associated with the variability among regions, s. A second variance component, s, is associated with the variability among herds within regions. The third variance component,s ², models the variability of the random error component.

Estimates of these three variance components can be obtained using the maximum likelihood (ML) or the restricted maximum likelihood (REML). The table below shows the ML-estimates and the REML-estimates. It provides an illustration of the underestimation of the variance components by the maximum likelihood method. The ML- and the REML- estimates can be obtained by the SAS program shown in Appendix B.2.2.

Table 2.4.2.1 Variance components estimates for Example 1.2.5

Variance component	ML-estimates	REML-estimates
s	4.292	5.151
s	0.322	0.387
s²	1.747	2.096

2.4.3 Best linear unbiased prediction of random effects

In some experiments, the vector of random effects itself is of interest as well as the variance component based on this vector. Example 1.2.7 on the inheritance of helminth-resistance in sheep can be used to illustrate this. In the initial analysis estimation of the variance components s and s² is of main interest. The larger s is relative to s² the greater is the evidence for genetic inheritance.

Subsequently, we need to compare sires We are thus interested in prediction of the values of the random effects s_i, i = 1, ..., 8.

There are two different sources of information contributing to the random effect s_i, the observed values of the lambs of sire i,Yi1,... Y_ini and the distributional assumption s_i ~ N(0,s ). In order to use both sources of information, a random effect is predicted from its expected value, conditional on the mean of the relevant observations.

The random effects model of Example 1.2.7 was given by

Y_ij= µ + s_i + e_ij

If the mean response value of the offspring from sire i is given by , then the expected value of the random effect, conditional on the relevant observations, is given by

The joint distribution of s_iand . is given by

It can be shown (see Appendix D), given this joint distribution, that

As s² tends to zero or n_is tends to infinity, this expression is approximately equal to (Y_i, . µ). Otherwise, the expression requires REML-estimates of the variance components and the ML-estimate of µ, in order to obtain an estimate _i. As will be shown in the next section, the ML-estimator for µ can be obtained by

Example 2.4.3.1

The random effects model presented above is fitted to the data of Example 1.2.7. The REML estimates for the variance components are given by s= 3.685 and s²= 3.542. The ML-estimate of the mean µ is given by

The best linear unbiased prediction for the random effect of sire 4 is thus given by

The best linear unbiased predictions for the 8 sires in the data set can be obtained by the SAS program shown in Appendix B.2.3

2.5 Statistical inference for fixed effects

2.5.1 Estimation of fixed effects and their variance

Estimates of the fixed effects can be obtained by maximising the log-likelihood of Y

Differentiating the log-likelihood with respect to and setting it equal to 0

If the variance components contained in V are known and can be substituted into the expression, this estimator is called the generalised least squares estimator.

The variance-covariance matrix of ß is given by

However, the variance components contained in V are usually not known. Therefore, the REML-estimates need to be substituted, giving the estimate V, and the estimate based on the maximum likelihood criterion can then be obtained by the expression

= (X^TV^{. 1}X)^. X^TV^{. 1}Y

Furthermore, D() is estimated by (X^T^{.
1}X) ^. X^T ^{.
1}X (X^T^{. 1}X) . This estimator is biased downwards (i.e. is on the average smaller than the population value (X^TV^{. 1}X) ^. X^TV ^{. 1}X (X^TV^{. 1}X) . ) since the uncertainty introduced with the REML-estimates of the variance components in V is not taken into account in the approximation for (X^TV^{. 1}X) (Kackar and Harville,l984).

Example 2.5.1.1

We revisit Example 1.2.5. The mixed model is given by

Y_ijk= µ + d_i+ r_j + h_ij+ T_k+ (d_T)_ik +e_ijk

Since the model is overparameterised, the solution for ß is not unique; there exist an infinite number of solutions (see Section 2.2). This can also be seen by the fact that the matrix X^TV ^{. 1}X is not of full rank and that we thus have to use generalised inverses. One solution is to set some of the parameter levels equal to 0, thereby reducing the number of parameter levels. In this example, the following restrictions are used

d² = 0; T₂ = 0; (d_T)₁₂ = 0; (d_T)₂₁ = 0;(d)₂₂₌0

Therefore, a maximum likelihood estimate of the parameter vector

ß^T = (µ, d₁, d₂, T₁, T₂, (dT)₁₁, (dT)₁₂, (dT)₂₁, (dT)₂₂)

can be shown, following substitution of for V, to be

^T = (17.13, 7.15, 0, 4.35, 0, . 3.18, 0, 0, 0)

and the variance-covariance matrix

These results can be obtained by the SAS program shown in Appendix B.2.4.

2.5.2 Estimation of linear combinations of fixed effects and their variance

Interest is not in the individual parameters of an overparameterised model (as they are not estimable), but in estimable functions of those parameters c^Tß so that the solution is independent of the generalised inverse. A simple example of such an estimable function is the mean of one level of a parameter (e.g. a treatment) averaged over other parameter levels. Both the estimate and the variance of the estimate of a linear combination of the parameters can easily be obtained if the vector of parameter estimates and its estimated variance-covariance matrix are available.

Assume that c^Tß is an estimable function. It is easy to show that the maximum likelihood estimator of c^Tß is given by c^Tß; the estimated variance (see Appendix E) is

Example 2.5.2.1

As already mentioned, the cell means, being linear combinations of the parameters, are relevant and estimable parameters. In Example 1.2.5 the expected value for the first cell mean is given by the following linear combination

It represents the mean response of animals treated with a high dose of Berenil.

With c = ( 1 0 1 0 1 0 0 0 1 ) we obtain

This cell mean represents the mean response of animals treated with the low dose of Samorin.

Estimates for the cell means are then obtained by substituting the parameter

estimates obtained in Example 2.5.1.1

The variance of a cell mean ₁₁can also be obtained.

For instance, the estimated variance of ₁₁ is given by

Vâr(c = c()c₁=1.27

This shows that the estimates for the cell means model can easily be obtained from the analysis of the overparameterised model. The SAS program that derives the cell mean estimates with their estimated variances is shown in Appendix B.2.5.

2.5.3 Confidence intervals

We often like to determine an upper and lower confidence limit for an estimable parameter, say c^Tß.. Confidence intervals provide an appropriate probabilistic tool for this. It leads to the determination of a lower and an upper confidence limit for the parameter function under consideration. The distributional properties of a specific linear combination, needed to derive confidence intervals, follows easily from the fact that the estimators are linear combinations of the observations Y. Since Y is multivariate normal,. the estimator c^T also follows a normal distribution.

c^T ~ N(c^T, c^Tß, cT (X^T V^{. 1} X) ^.c)

In general, however, the variance components in V must be estimated. So c^T is assumed to follow a t-distribution with v degrees of freedom, where v corresponds to the degrees of freedom available to estimate the variance c^T (X^T V^{. 1}X) ^_c.

In some situations, the calculated variance corresponds to the expected mean square of one of the random effects factors. In this case, the degrees of freedom are the degrees of freedom of the mean square itself, which can easily be determined (see Example 2.5.3.1).

In general in a mixed model, however, the variance of a linear combination is a complicated expression of different expected mean squares, and the degrees of freedom can only be approximated.

The Satterthwaite procedure (Satterthwaite, 1946), based on quadratic forms (mean squares typically are quadratic forms), provides a way to obtain approximate degrees of freedom and is given by

In this expression, the variance term in the denominator must be estimated. An approximation for this variance term has been proposed by Giesbrecht and Burns (1985) and McLean and Sanders (1988). The derivation of both the expression for the degrees of freedom and the approximation for the variance is discussed in detail in Appendix F.

If c^T (X^TV^{. 1}X)^. c can be written as a linear combination of mean squares, i.e. a₁MS₁ +...+ a¹MS_l with a_l, ..., a_l constants and MS_jthe mean square due to the j^th source of variation in the analysis of variance, then the Satterthwaite procedure simplifies to give

with df_j the degrees of freedom for MS_j.

The (1 - a)100% confidence interval for c^Tß is then given by

Example 2.5.3.1 Confidence intervals

In the analysis of the data of Example 1.2.5, different linear combinations might be of interest. Two linear combinations cß ~ and cß have already been shown in Section 2.5.2.

If the parameter vector is given by

ß^T= ( µ d₁d₂T₁ T₂(d_T)₁₁(d_T)₁₂ (d_T)₂₁ (d_T)₂₂ )

some other examples of linear combinations are

1. The mean response for animals treated with Samorin, µ2,, given by

cß = ( 1 0 1 0.5 0.5 0 0 0.5 0.5) ß

2. The mean difference between animals treated with Samorin and Berenil, µ₂ . µ_1., given by

cß = (0 -1 1 0 0 -0.5 -0.5 0.5 0.5) ß

3. The mean response for animals treated with a high dose of Samorin, µ₂₁, given by

cß = ( 1 0 1 1 0 0 0 1 0) ß

4. The mean difference between animals treated with a high versus a low dose of Samorin, µ₂₁ . µ₂₂, given by

cß= (0 0 0 1 . 1 0 0 1 . 1 ) ß

It can easily be shown that an estimate for cß is given by cß= ₂.., with _2..the mean of the 12 observations on Samorin. The corresponding variance is

Similarly, variances for the other linear combinations can be obtained and are given by

By substituting the REML-estimates of the variance components ( = 5.15; = 0.387; ² = 2.096) in these expressions, the following estimates for the variances of the linear combinations are obtained

Vâr(c) = 1.0975

Vâr(c) = 0.478

Vâr(c) = 1.272

Vâr(c) = 0.698

In order to derive confidence intervals for these linear combinations, based on the t-distribution, the degrees of freedom associated with the variance term need to be calculated.

An appropriate tool to derive these degrees of freedom is the table of the different mean squares with their corresponding degrees of freedom and expected values, which is given below for Example 1.2.5.

Table 2.5.3.1. Table of mean squares and their degrees of freedom an expected values foe Example 1.2.5

Factor	df	Mean square	Expected mean square
Region	5	23.47	4s + 2s + s²
Drug	1	185.37	2s + s² + f (drug)
Herd (=region*drug)	5	2.87	2s +s ²
Dose	1	45.65	s² + f (dose)
Drug*dose	1	15.20	s² + f (drug*dose)
Error	10	2.096	s²

where f (factor) is a term in the expected mean squares due to an individual fixed effect. The expected mean squares for this example can be obtained by the program given in Appendix B.2.6.

The derivation of the degrees of freedom is only straightforward for cß and cß. They have 5 and 10 degrees of freedom as their variances are only dependent on the expected mean square for the herd (5 df) and error (10 df), respectively.

The variances of the other linear combinations can be obtained by combining the expected mean squares of the different factors in the following way.

The variance of c= is (2s + 2s + s²)/12. The numerator can be obtained from 0.5E(MS_region)+0.5E(MS_herd). Using the simplified Satterthwaite formula, with a_l= a₂ = 0.5, we obtain

For c. the variance is (s + s + s²)/16; the numerator can be obtained from

0.25E(MS_region)+0.25E(MS_herd)+0.5E(MS_error). Therefore

Thus, 95% confidence intervals are given by

From these confidence intervals, it can be concluded that there is a large and significant difference between animals treated with Samorin and Berenil (cß.). There is, however, no difference between the high and low dose of Samorin (cß.). These estimates and their standard errors can be obtained by the SAS program given in Appendix B.2.7.

2.5.4 The appropriate inference space

From the example in the previous section it is clear that the variance of c^Tß. with c^Tß. estimable, is a linear combination of the variance components present in the mixed model. In some situations it is meaningful to narrow the scope of the study and to think about one or more of the random effects as fixed effects. As a consequence, the corresponding variance components will not appear in the expression of the variance for the estimator c^Tß.. If we think of one or more of the random effects as fixed effects we say that we consider a narrow inference space. If we assume all effects to be random we talk about the broad inference space. Based on examples we show that the confidence limits for c^Tß. are quite different for broad and narrow inference spaces. If we consider all factors as fixed effects the only variance component is s², the error term variance. It typically follows that the narrower the inference space the smaller will the confidence interval be.

Example 2.5.4.1 Confidence intervals in the fixed effects and mixed model

It has been shown for the data of Example 1.2.5 that the variance of the mean response of animals treated with Samorin, µ₂, is given by

The variance of ₂ ..., the estimate of µ₂, in the fixed effects model, i.e., the model with region and herd as fixed effects, is

This estimate is much smaller than that calculated previously for cß. when r_j and h_2j are assumed random, and therefore the width of the confidence interval for µ₂. is much narrower.

The estimated standard errors for the linear combination µ₂, can be obtained by the SAS program given in Appendix B.2.8 for both the fixed effects model and the mixed model.

2.6 Hypothesis testing

Testing general linear hypotheses for fixed effects by an appropriate F-test is much more complex in mixed models than in fixed effects models. As has already been shown, there is only one source of random variation in fixed effects models. The F-statistic consists of a ratio of two mean squares, the denominator being the error mean square. For that reason the derivation of the degrees of freedom of the denominator of the F-test is straightforward, as shown before. In the mixed model, however, there are different sources of random variation, and the denominator of the F-test consists in most cases of a combination of these different sources of random variation. The derivation of the appropriate degrees of freedom for the denominator in the F-test is therefore not straightforward, and can, in most practical cases, only be approximated by the Satterthwaite procedure as described in Appendix F.

General linear hypotheses for the fixed effects take the form

H₀ : C^Tß. = 0 versus H_a : C^Tß.? 0

with ß a (b x 1) vector and C a (b x q) matrix of v_i independent and estimable functions. Thus (rank(C)=v_l).

To estimate C^Tß we use C^Tß and since

D(C^Tß) = C^TD(ß)C = C^T (X^T V ^{. 1}X) ^.C

we use C^T (X^T V ^{. 1}X)^. C as the estimated variance-covariance matrix.

It can be shown that the statistic

approximately follows a F-distribution under the null hypothesis with v₁ as numerator degrees of freedom.

As can be seen from this expression, F is no longer a ratio of two mean squares as in the fixed effects model. The role of the denominator is taken over by (X^TV^{. 1}X). The denominator degrees of freedom, v_2, thus correspond to the degrees of freedom associated with (X^TV^{. 1}X) and can be approximated by the Satterthwaite procedure in a similar way as shown for the t-test. Further details on these approximate F-tests for general linear hypotheses are in Fai and Cornelius (1996).

The P-value for testing the null hypothesis is then given by the probability that Fv_1,v₂is larger than the corresponding value of the F-distribution, i.e.

Prob (Fv_1,v₂ = F)

Example 2.6.1 Hypothesis tests

The null hypothesis and alternative hypothesis of no difference between Samorin and Berenil in Example 1.2.5 are given by

H₀ : µ₂. = µ₁. and H_a : µ₂ ? µ₁.

Thus in this simple case, where C is a (q x 1) vector, the null hypothesis C^Tß = 0

can be written as

follows a F-distribution with 1 and 5 degrees of freedom (with 5 being the degrees of freedom associated with region as calculated before for cß). The estimate for C^Tß equals . 5.55 and an estimate for its variance is given by 0.478.

Therefore, the P-value for the null hypothesis is given by

In testing a more complex hypothesis in which C is a matrix rather than a vector, assume that, for Example 1.2.5, interest is in the following hypothesis

H₀ : µ₁₁= µ₁₂ = µ₂₁ = µ₂₂ versus H_a : not all µ_ij equal

We can rewrite this hypothesis by building up three simultaneous independent estimable functions and testing them together

The first estimable function tests whether there is a difference between drugs (=µ₁. . µ₂.), the second between doses (=µ_.1. µ_.2) and the third whether there is an interaction between dose and drug (=(µ₁₁. µ₁₂) . (µ₂₁ . µ₂₂)).

It can be shown that the value of the F-statistic is 31.2 and that the denominator degrees of freedom equal 7.14. The rank of C and thus the numerator degrees of freedom equal 3. Therefore, the P-value is Prob(F_3,7 = 31.2) = 0.0002. The null hypothesis is clearly rejected. The corresponding SAS program is given in Appendix B.2.9.