[R-sig-ME] Data sheet notation and model structure for GLMM with 3 non-factorial factors

Raldo Kruger raldo.kruger at gmail.com
Sat Sep 26 10:11:40 CEST 2009


Hi Douglas,

Many thanks for the input. I've run two analyses on the same dataset
using 1) indicator columns and the 2) a single 'factor / treatment'
column for the non-factorial design described in my previous e-mail,
and the results were identical (great!).

However, I did the same for a dataset with a factorial design (N, G,
N*G, i.e. there were plots with N, plots with G, and plots with both N
and G), and the results for the main effects are identical, but the
estimates for the interaction effects (N*G) are different between the
two analyses (see below). Could you help me make sense of that please
(i.e. which one is correct?) !

Thanks,
Raldo

With expanded treatment notation-
Fixed effects:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    2.92060    0.23834  12.254  < 2e-16 ***
N              0.03766    0.03486   1.080   0.2801
G              0.14929    0.03395   4.397 1.10e-05 ***
Yearthree     -2.85449    0.10664 -26.768  < 2e-16 ***
Yeartwo       -1.88175    0.06844 -27.494  < 2e-16 ***
N:G           -0.31633    0.04953  -6.386 1.70e-10 ***
N:Yearthree    0.15710    0.14428   1.089   0.2762
N:Yeartwo      0.14736    0.09305   1.584   0.1133
G:Yearthree   -0.25107    0.15430  -1.627   0.1037
G:Yeartwo      0.07550    0.09200   0.821   0.4118
N:G:Yearthree  0.36353    0.20810   1.747   0.0807 .
N:G:Yeartwo   -0.01158    0.12996  -0.089   0.9290

With single column treatment notation-
Fixed effects:
                   Estimate Std. Error z value Pr(>|z|)
(Intercept)         2.92057    0.23836  12.253  < 2e-16 ***
TreatG              0.14928    0.03395   4.397 1.10e-05 ***
TreatN              0.03767    0.03486   1.080 0.279928
TreatNG            -0.12938    0.03639  -3.556 0.000377 ***
Yearthree          -2.85448    0.10664 -26.768  < 2e-16 ***
Yeartwo            -1.88175    0.06844 -27.494  < 2e-16 ***
TreatG:Yearthree   -0.25109    0.15430  -1.627 0.103693
TreatN:Yearthree    0.15711    0.14428   1.089 0.276199
TreatNG :Yearthree  0.26959    0.14636   1.842 0.065483 .
TreatG:Yeartwo      0.07549    0.09200   0.820 0.411941
TreatN:Yeartwo      0.14735    0.09305   1.583 0.113308
TreatNG :Yeartwo    0.21118    0.09558   2.210 0.027139 *


On Thu, Sep 24, 2009 at 2:10 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Thu, Sep 24, 2009 at 1:22 AM, Raldo Kruger <raldo.kruger at gmail.com> wrote:
>> Hi R users,
>>
>> I have 3 factors in a non-factorial design (G, K and N), as well as
>> two time periods (Year) and a random factor (Site), with Plant numbers
>> as the response variable.
>>
>> My 1st question relates to the the notation of the treatments in the
>> data frame. Is it appropriate to use an expanded treatment notation,
>> such as this, when using glmer{lme4}:
>>
>> Site    Year    Plant   G       K       N
>> A       1       5       0       0       0
>> A       1       4       1       0       0
>> A       1       7       0       1       0
>> A       1       10      0       0       1
>> A       2       3       0       0       0
>> A       2       4       1       0       0
>> A       2       8       0       1       0
>> A       2       12      0       0       1
>> B       1       7       0       0       0
>> B       1       3       1       0       0
>> B       1       7       0       1       0
>> B       1       12      0       0       1
>> B       2       4       0       0       0
>> B       2       5       1       0       0
>> B       2       6       0       1       0
>> B       2       11      0       0       1
>>
>> With the model
>>
>> m1<-glmer(Plant~G+K+N+Year+(1|Site), ...)
>>
>> Or is it better to use a single column for the treatments, like this:
>>
>> Site    Year    Plant   Treatment
>> A       1       5       C
>> A       1       4       G
>> A       1       7       K
>> A       1       10      N
>> A       2       3       C
>> A       2       4       G
>> A       2       8       K
>> A       2       12      N
>> B       1       7       C
>> B       1       3       G
>> B       1       7       K
>> B       1       12      N
>> B       2       4       C
>> B       2       5       G
>> B       2       6       K
>> B       2       11      N
>>
>> With the following model:
>> m1<-glmer(Plants~Treatment+Year+(1|Site), ...)
>
> The latter is preferred.  R will generate the indicator columns for
> the levels of the Treatment factor (the 0/1 columns shown in the first
> form) and, when appropriate, reduce them to a set of 2 "contrasts" in
> the model.  (The reason for quoting the word "contrasts" is that there
> is a formal mathematical definition of a contrast but the linear
> combinations generated by R do not always satisfy this definition.
> The method and results are correct, it is just the name that is
> inaccurate.)
>
> The reason that the latter is preferred is that it is easier to
> maintain the data in a consistent form (factors maintain consistency
> and are easy to check in the output from str() or summary(), whereas
> indicator columns have inter-column dependencies that must be checked
> separately) and the "when appropriate" clause above.  Determining a
> useful parameterization of a linear model incorporating factors is
> subtle and a lot of code in the R function model.matrix is devoted to
> a symbolic analysis designed to get this right.  Also, you can, if you
> wish, change the parameterization (see ?contrasts).
>



-- 
Raldo




More information about the R-sig-mixed-models mailing list