[BioC] Limma, model with several factors
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Aug 9 06:46:49 CEST 2013
Dear Jenny,
First, let me comment on your experiment. I think that you are
over-analysing to some extent. There are strong apriori reasons to expect
repeated measures on the same subject to be correlated, so I think you
should include this in your model even though the correlation appears to
be low for your data. I don't agree with trying to estimate a repeated
measures correlation for some treatment levels and not others, unless
there are very special aspects to your experiment that I am not aware of.
I view this as cherry picking.
My advice: just use duplicateCorrelation as usual and be done with it.
The correlation is small, so it will only make a modest difference
compared to ignoring it, but it is the right thing to do in principle.
I would not try to treat patients as fixed effects with your data, not
because it cannot be done, but because it would be overly heavy-handed
considering the weak intra-patient correlations.
In saying this, I am assuming that you have done the usual quality
assessment plots like plotMDS, have looked for outlier samples and have
considered the need for array quality weights and so on.
There are many designs for which the patient effect can be treated as
either a fixed or a random. You give a link to a post from me from March
2013. You claim I indicated that Subject can't be treated as a fixed
effect, but I did not say that. I said that treating patient as random is
one solution. I did not say or imply that it cannot be treated as fixed.
Now let me turn to Ingrid Dahlman's experiment. You are correct that the
design matrix I suggested to Ingrid was singular. Given that design
matrix, limma would have automatically fixed the singularity by removing
the design column for Subject4 and hence treated Subject1 and Subject4 as
if they were the same person. That analysis is obviously not quite
correct, although it probably would have given reasonable results in
practice.
Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she
wanted to block for subject. I took that to mean that she wanted compare
before (B) to after (A) using within-subject information only. That
implies a fixed effect model, so I did not consider the random effects
model in my reply to her.
Subject center treatment timepoint
1 1 F B
1 1 F A
2 1 P B
2 1 P A
3 2 F B
3 2 F A
4 2 P B
4 2 P A
The problem with my suggested model formula was that it is not possible to
compare F to P within subject, because each subject receives only one of
the treatments. However it is possible to estimate the contrast asked for
with a custom design matrix:
F.AvsB <- as.numeric( treatment=="F" & timepoint=="A")
P.AvsB <- as.numeric( treatment=="P" & timepoint=="A")
design <- model.matrix(~Subject+F.AvsB+P.AvsB)
This is a full rank matrix with 6 columns. The comparison that Ingrid
wanted is available as the contrast F.AvsB-P.AvsB.
Finally, I am not sure what you want me to say about the aov() command
with an Error() term in the model. That is a neat way of fitting a very
special random effects model. It fits a fixed effects model but
interprets it as partly random. It works for balanced univariate designs
for which the moment estimators from the ANOVA table coincide with the
REML estimators of the variance components. It is not applicable to
Ingrid's design. The duplicateCorrelation() approach in limma fits a
similar model, with added assumptions about consistency across genes, but
without the need for a balanced design.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote:
> Hi Gordon,
>
> I'm confused by your response to Ingrid last May as to how to construct
> a design matrix for her 2 factor experiment with repeated measures on 1
> factor. I came across this post in searching the mailing list because I
> have a similar experimental design, but I was getting the dreaded
> "Coefficients not estimable" warning message with what I thought was the
> same design matrix you suggested to Ingrid. Your example below actually
> produces a singular design matrix:
>
> #reproducible example:
>> Subject <- gl(4,2)
>> treatment <- gl(2,2,8)
>> timepoint <- gl(2,1,8)
>> Treat.Time <- factor(paste(treatment,timepoint,sep="."))
>> design <- model.matrix(~0+Treat.Time+Subject)
>> colnames(design)[1:4] <- levels(Treat.Time)
>> design
> 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4
> 1 1 0 0 0 0 0 0
> 2 0 1 0 0 0 0 0
> 3 0 0 1 0 1 0 0
> 4 0 0 0 1 1 0 0
> 5 1 0 0 0 0 1 0
> 6 0 1 0 0 0 1 0
> 7 0 0 1 0 0 0 1
> 8 0 0 0 1 0 0 1
> attr(,"assign")
> [1] 1 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$Treat.Time
> [1] "contr.treatment"
>
> attr(,"contrasts")$Subject
> [1] "contr.treatment"
>
>>
>> is.fullrank(design)
> [1] FALSE
>>
>> nonEstimable(design)
> [1] "Subject4"
>
> My searches of the mailing list led to a previous question also about
> the same experimental design, in which your response seems to indicate
> that you can't treat Subject as a fixed effect:
> https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html
>
> Is this really true? I found this page
> (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html)
> about modeling repeated measures factors in R, and it seems you could
> model a single gene like this:
>
> #not-reproducible
>> aov.out = aov(expression.level ~ treatment * timepoint + Error(Subject/timepoint), data=one.gene)
>
> But this model formulation doesn't work with model.matrix(). So I'm
> confused about the best way to model my experiment: I have 11 subjects
> divided into 4 treatments with 3 timepoint measurements per subject and
> we're interested in treatment effects, time effects and interaction
> effects. I'm worried about simply treating Subject as a random effect
> because when I look separately within each treatment, only 1 of the 4
> treatments actually has any substantial Subject effect! (Subject
> correlation from duplicateCorrelation() for the entire experiment is
> 0.038, but within each treatment the values are -0.036, -0.042, -0.041
> and 0.110). If none of them had an observable Subject effect, I could
> just ignore it and do a standard 2-factor ANOVA design, but because only
> one treatment has an effect I'm worried the overall correlation value
> over-estimates the Subject effect in 3 treatments and under-estimates in
> the 1 treatment. Can I possibly drop some of the Subject coefficients
> from model.matrix(~0+Treat.Time+Subject) and only keep those pertaining
> to the treatment that does show Subject effects?
>
> I would greatly appreciate any advice or suggestions!
> Jenny
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Gordon K Smyth
> Sent: Monday, May 20, 2013 7:29 PM
> To: Ingrid Dahlman [guest]
> Cc: Bioconductor mailing list
> Subject: [BioC] Limma, model with several factors
>
> Dear Ingrid,
>
> If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers.
>
> You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide.
>
> First combine treatment and timepoint into one factor (following Section
> 8.5.2):
>
> Treat.Time <- factor(paste(treatment,timepoint,sep="."))
>
> Then make a design matrix including the blocking factor:
>
> design <- model.matrix(~0+Treat.Time+Subject)
> colnames(design)[1:4] <- levels(Treat.Time)
> cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design)
>
> etc.
>
> The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on.
>
> Best wishes
> Gordon
>
>> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT)
>> From: "Ingrid Dahlman [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, ingrid.dahlman at ki.se
>> Subject: [BioC] Limma, model with several factors
>>
>>
>> I have carefully read the Limma users guide but still have not sorted
>> out how I design the contrasts for the following Project.
>>
>> In want to compare the effect (After vs before) of two different
>> treatments, F and P. The study was carried our in two different centers.
>> This can be illustrated as follows:
>
>> Subject center treatment timepoint
>> 1 1 F B
>> 1 1 F A
>> 2 1 P B
>> 2 1 P A
>> 3 2 F B
>> 3 2 F A
>> 4 2 P B
>> 4 2 P A
>
>> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However,
>> in addition, I would like to block for center. I.e. the center is like
>> a batch effect.
>
>> Is it possible to block for two factors, suject and center, in the
>> same test in Limma?
>
>> /Ingrid
>>
>> -- output of sessionInfo():
>>
>> See section 8.7 in the user guide.
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list