[R-sig-ME] Help with lme modelling

Wed Apr 1 20:44:29 CEST 2009

I would like to apologize in advance if this is not the right forum
for posting this message, since I am using nlme instead of lme4
package, but it seamed to me that this is the best place for Mixed
Effects Models (MEM) questions.

I've recently started using MEM for the differential expression
analysis of genes in cases where I want to analyze compatible datasets
from different experiments/labs (modelling labs as random effects). I
was pretty happy with the results since MEM allows for direct use of
raw expression data, unlike some other meta-analysis approaches and
thus seem to be more sensitive.

However, I discovered several cases where MEM failed to make correct
predictions and I would like to find a way to automatically detect
these in the future (I am doing the analysis on thousands of genes so
it is not feasible to graphically inspect the quality of a model fit
for each one separately).

Here is an example:

#data frame

> data
          expr.val   lab treatment   age
X1        4.751217     A       sal   old
X2        4.660610     A       sal   old
X3        4.810334     A       sal   old
X4        5.235475     A        KA   old
X5        5.414665     A        KA   old
X6        4.451858     A        KA   old
X7        4.812522     A       sal young
X8        6.346419     A       sal young
X9        5.267596     A       sal young
X10       3.823031     A        KA young
X11       3.619814     A        KA young
X12       3.539568     A        KA young
X13       5.671991     B       sal   old
X14       5.715370     B       sal   old
X15       5.694324     B       sal   old
X16       5.457560     B        KA   old
X17       5.454112     B        KA   old
X18       5.430781     B        KA   old

As you can see the data comes from two labs, A and B, there are two
treatments, KA and sal, and the effect of these is my main interest
and there are also two age groups.

Fitting linear MEM (I expect some interaction between two fixed
effects, treatment and age):

> data.lme<- lme(expr.val~treatment*age,data=data,random=~1|lab)

> summary(data.lme)
Linear mixed-effects model fit by REML
 Data: m
       AIC      BIC    logLik
  33.76913 37.60347 -10.88456

Random effects:
 Formula: ~1 | Study
        (Intercept)  Residual
StdDev:   0.4553264 0.3960901

Fixed effects: E ~ Treatment * Age
                          Value Std.Error DF   t-value p-value
(Intercept)            5.240742 0.3602901 13 14.545894  0.0000
Treatmentsal          -0.023434 0.2286827 13 -0.102475  0.9199
Ageyoung              -1.276539 0.3000890 13 -4.253867  0.0009
Treatmentsal:Ageyoung  1.838143 0.3960901 13  4.640719  0.0005
 Correlation:
                      (Intr) Trtmnt Ageyng
Treatmentsal          -0.317
Ageyoung              -0.242  0.381
Treatmentsal:Ageyoung  0.183 -0.577 -0.660

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max
-1.6738372 -0.3845786 -0.2229425  0.4311380  2.1987599

Number of Observations: 18
Number of Groups: 2

> anova(data.lme)
              numDF denDF   F-value p-value
(Intercept)       1    13 232.76945  <.0001
Treatment         1    13   9.96020  0.0076
Age               1    13   2.51385  0.1369
Treatment:Age     1    13  21.53627  0.0005

What I do when automatically analyzing all the genes is to look at the
p-value from anova function to assess the significance of Treatment
effect and at the sign of estimated parameter (Treatmentsal) from the
summary function to decide on the direction of change (is it more
expressed after sal or KA treatment). In the above example the results
indicate that Treatment has significant effect and that there is more
expression for KA effect (Treatmentsal=-0.023434). However, when I
plot the raw data and superimpose the fitted values it is obvious that
the fit is not good for KA-old datapoints and this is probably caused
by contradictory data wrt to treatment effect between the two labs
(sal<KA for A and sal>KA for B).

My questions are:

1. How is the F-value in  anova function computed? I would like to
understand why I got such a small value when data is contradictory.

2. The Treatmentsal value seems to correspond only to the age=old
(change from a baseline treatment=KA, age=old). How do I compute the
Treatmentsal value for both age groups?

3. In general, what functions/parameters can I used in automated way
to decide if there is a significant effect of, for example, treatment
KA?

Sorry for the lengthy post and superficial understanding of how MEM
work (not much background in statistics). I would very much appreciate
any help on these issues.

Cheers,
Sanja Rogic