[BioC] Limma, model with several factors

Fri Aug 9 06:46:49 CEST 2013

Dear Jenny,

First, let me comment on your experiment.  I think that you are 
over-analysing to some extent.  There are strong apriori reasons to expect 
repeated measures on the same subject to be correlated, so I think you 
should include this in your model even though the correlation appears to 
be low for your data.  I don't agree with trying to estimate a repeated 
measures correlation for some treatment levels and not others, unless 
there are very special aspects to your experiment that I am not aware of. 
I view this as cherry picking.

My advice: just use duplicateCorrelation as usual and be done with it. 
The correlation is small, so it will only make a modest difference 
compared to ignoring it, but it is the right thing to do in principle.

I would not try to treat patients as fixed effects with your data, not 
because it cannot be done, but because it would be overly heavy-handed 
considering the weak intra-patient correlations.

In saying this, I am assuming that you have done the usual quality 
assessment plots like plotMDS, have looked for outlier samples and have 
considered the need for array quality weights and so on.

There are many designs for which the patient effect can be treated as 
either a fixed or a random.  You give a link to a post from me from March 
2013.  You claim I indicated that Subject can't be treated as a fixed 
effect, but I did not say that.  I said that treating patient as random is 
one solution.  I did not say or imply that it cannot be treated as fixed.

Now let me turn to Ingrid Dahlman's experiment.  You are correct that the 
design matrix I suggested to Ingrid was singular.  Given that design 
matrix, limma would have automatically fixed the singularity by removing 
the design column for Subject4 and hence treated Subject1 and Subject4 as 
if they were the same person.  That analysis is obviously not quite 
correct, although it probably would have given reasonable results in 
practice.

Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she 
wanted to block for subject.  I took that to mean that she wanted compare 
before (B) to after (A) using within-subject information only.  That 
implies a fixed effect model, so I did not consider the random effects 
model in my reply to her.

Subject center treatment timepoint
1            1         F         B
1            1         F         A
2            1         P         B
2            1         P         A
3            2         F         B
3            2         F         A
4            2         P         B
4            2         P         A

The problem with my suggested model formula was that it is not possible to 
compare F to P within subject, because each subject receives only one of 
the treatments.  However it is possible to estimate the contrast asked for 
with a custom design matrix:

   F.AvsB <- as.numeric( treatment=="F" & timepoint=="A")
   P.AvsB <- as.numeric( treatment=="P" & timepoint=="A")
   design <- model.matrix(~Subject+F.AvsB+P.AvsB)

This is a full rank matrix with 6 columns.  The comparison that Ingrid 
wanted is available as the contrast F.AvsB-P.AvsB.

Finally, I am not sure what you want me to say about the aov() command 
with an Error() term in the model.  That is a neat way of fitting a very 
special random effects model.  It fits a fixed effects model but 
interprets it as partly random.  It works for balanced univariate designs 
for which the moment estimators from the ANOVA table coincide with the 
REML estimators of the variance components.  It is not applicable to 
Ingrid's design.  The duplicateCorrelation() approach in limma fits a 
similar model, with added assumptions about consistency across genes, but 
without the need for a balanced design.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote:

> Hi Gordon,
>
> I'm confused by your response to Ingrid last May as to how to construct 
> a design matrix for her 2 factor experiment with repeated measures on 1 
> factor. I came across this post in searching the mailing list because I 
> have a similar experimental design, but I was getting the dreaded 
> "Coefficients not estimable" warning message with what I thought was the 
> same design matrix you suggested to Ingrid. Your example below actually 
> produces a singular design matrix:
>
> #reproducible example:
>> Subject <- gl(4,2)
>> treatment <- gl(2,2,8)
>> timepoint <- gl(2,1,8)
>> Treat.Time <- factor(paste(treatment,timepoint,sep="."))
>> design <- model.matrix(~0+Treat.Time+Subject)
>> colnames(design)[1:4] <- levels(Treat.Time)
>> design
>  1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4
> 1   1   0   0   0        0        0        0
> 2   0   1   0   0        0        0        0
> 3   0   0   1   0        1        0        0
> 4   0   0   0   1        1        0        0
> 5   1   0   0   0        0        1        0
> 6   0   1   0   0        0        1        0
> 7   0   0   1   0        0        0        1
> 8   0   0   0   1        0        0        1
> attr(,"assign")
> [1] 1 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$Treat.Time
> [1] "contr.treatment"
>
> attr(,"contrasts")$Subject
> [1] "contr.treatment"
>
>>
>> is.fullrank(design)
> [1] FALSE
>>
>> nonEstimable(design)
> [1] "Subject4"
>
> My searches of the mailing list led to a previous question also about 
> the same experimental design, in which your response seems to indicate 
> that you can't treat Subject as a fixed effect:

> https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html
>
> Is this really true? I found this page 
> (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) 
> about modeling repeated measures factors in R, and it seems you could 
> model a single gene like this:
>
> #not-reproducible
>> aov.out = aov(expression.level ~ treatment * timepoint + Error(Subject/timepoint), data=one.gene)
>
> But this model formulation doesn't work with model.matrix(). So I'm 
> confused about the best way to model my experiment: I have 11 subjects 
> divided into 4 treatments with 3 timepoint measurements per subject and 
> we're interested in treatment effects, time effects and interaction 
> effects. I'm worried about simply treating Subject as a random effect 
> because when I look separately within each treatment, only 1 of the 4 
> treatments actually has any substantial Subject effect! (Subject 
> correlation from duplicateCorrelation() for the entire experiment is 
> 0.038, but within each treatment the values are -0.036, -0.042, -0.041 
> and 0.110). If none of them had an observable Subject effect, I could 
> just ignore it and do a standard 2-factor ANOVA design, but because only 
> one treatment has an effect I'm worried the overall correlation value 
> over-estimates the Subject effect in 3 treatments and under-estimates in 
> the 1 treatment. Can I possibly drop some of the Subject coefficients 
> from model.matrix(~0+Treat.Time+Subject) and only keep those pertaining 
> to the treatment that does show Subject effects?
>
> I would greatly appreciate any advice or suggestions!
> Jenny

> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Gordon K Smyth
> Sent: Monday, May 20, 2013 7:29 PM
> To: Ingrid Dahlman [guest]
> Cc: Bioconductor mailing list
> Subject: [BioC] Limma, model with several factors
>
> Dear Ingrid,
>
> If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers.
>
> You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide.
>
> First combine treatment and timepoint into one factor (following Section
> 8.5.2):
>
>   Treat.Time <- factor(paste(treatment,timepoint,sep="."))
>
> Then make a design matrix including the blocking factor:
>
>   design <- model.matrix(~0+Treat.Time+Subject)
>   colnames(design)[1:4] <- levels(Treat.Time)
>   cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design)
>
> etc.
>
> The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on.
>
> Best wishes
> Gordon
>
>> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT)
>> From: "Ingrid Dahlman [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, ingrid.dahlman at ki.se
>> Subject: [BioC] Limma, model with several factors
>>
>>
>> I have carefully read the Limma users guide but still have not sorted
>> out how I design the contrasts for the following Project.
>>
>> In want to compare the effect (After vs before) of two different
>> treatments, F and P. The study was carried our in two different centers.
>> This can be illustrated as follows:
>
>> Subject center treatment timepoint
>> 1            1         F          B
>> 1            1         F          A
>> 2            1         P          B
>> 2            1         P          A
>> 3            2         F          B
>> 3            2         F          A
>> 4            2         P          B
>> 4            2         P          A
>
>> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However,
>> in addition, I would like to block for center. I.e. the center is like
>> a batch effect.
>
>> Is it possible to block for two factors, suject and center, in the
>> same test in Limma?
>
>> /Ingrid
>>
>> -- output of sessionInfo():
>>
>> See section 8.7 in the user guide.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}