[BioC] limma's eBayes error: No residual degrees of freedom in linear model

Gordon K Smyth smyth at wehi.EDU.AU
Sat Nov 19 14:22:40 CET 2005


>[BioC] limma's eBayes error: No residual degrees of freedom in linear
>model
>Li,Qinghong,ST.LOUIS,Molecular Biology Qinghong.Li at rdmo.nestle.com
>Thu Nov 17 21:40:38 CET 2005
>
>Dear Gordon,
>
>I would like to thank you for pointing out the problem. This is the
>first time I tried to use Limma. The main reference materials I used
>is the Ch. 23 of Book Bioinformatics and Comp. Biol. solutions Using R
>and BioC. and the lab notes from microarray short course @ IBC 2004.
>In particular, the example I was following was the 23.10 in the book,
>factorial designs where we have five chips, 2 for WT and 3 for
>Mutants. In each genotype, there are unstimulated and stimulated. I
>thought that resembled the experimental designs in my case (target
>file):
>
>FileName Sib Sex Treatment
>anim1     1   M         C
>anim2     2   F         C
>anim3     3   M         C
>anim4     3   F         C
>anim5     1   M         R
>anim6     2   F         R
>anim7     3   M         R
>anim8     3   F         R
>
>Where Sib indicates sibling pairs: anim1 and anim5 are siblings and so
>forth. My question is quite simple: I would like to know if there is
>any difference between C and R in treatment for now. Although I might
>be interested in the gender effect (and/or gender*treatment) in a
>later time.
>
>First, I read in all CEL files and normalized the chips using Limma
>package and it looked quite good in diagnostic plots. Say I called
>that file "eset", which is an exprSet file. I used the following
>scripts to create the design matrix:
>
>> TBS<-paste(target$Treatment, target$Sex, target$Sib, sep=".")
>> TBS<-factor(TBS, levels=unique(TBS))
>> design<-model.matrix(~0+TBS)
>> colnames(design)<-levels(TBS)
>
>>cont.matrix<-makeContrasts(diff=
>(C.M.1+C.F.2+C.M.3+C.F.3)-(R.M.1+R.F.2+R.M.3+R.F.3))
>
>model fitting:
>>fit1<-lmFit(eset,design)
>>fit2<-contrasts.fit(fit1, cont.matrix)
>>fit3<-eBayes(fit2) (this is where I got the error message)
>
>Best wishes,
>Johnny
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: Wednesday, November 16, 2005 6:51 PM
>To: Li at wehi.edu.au; Qinghong at wehi.edu.au; ST.LOUIS at
>wehi.edu.au;
>Li,Qinghong,ST.LOUIS,Molecular Biology
>Cc: BioC Mailing List
>Subject: [BioC] limma's eBayes error: No residual degrees of freedom
>in
>linear model
>
>
>
>>[BioC] limma's eBayes error: No residual degrees of freedom in linear
>model
>>Li,Qinghong,ST.LOUIS,Molecular Biology Qinghong.Li at rdmo.nestle.com
>>Tue Nov 15 22:09:13 CET 2005
>>
>>Hi BioC,
>>
>>I was runing eBayes and got the above error. I searched the old
>archives
>>of BioC, and has found similar problem poseted by Ken Ninh:
>>http://files.protsuggest.org/biocond/html/4652.html
>>
>>I checked the summary(fit$df.residual), all zero's. But the
>>fit1<-lmFit(normData, design) and fit2<-contrasts.fit(fit1,
>cont.matrix)
>>ran properly. I checked normData with boxplots, and they looked fine
>and
>>well normalized. Here is my design matrix:
>>  > design
>>       C.M.1 C.F.2 C.M.3 C.F.3 R.M.1 R.F.2 R.M.3 R.F.3 (C/R:
>> control/treatment; F/M: male/female; 1,2,3 are sibling pairs)
>>1     1     0     0     0     0     0     0     0
>>2     0     1     0     0     0     0     0     0
>>3     0     0     1     0     0     0     0     0
>>4     0     0     0     1     0     0     0     0
>>5     0     0     0     0     1     0     0     0
>>6     0     0     0     0     0     1     0     0
>>7     0     0     0     0     0     0     1     0
>>8     0     0     0     0     0     0     0     1
>>attr(,"assign")
>>[1] 1 1 1 1 1 1 1 1
>>attr(,"contrasts")
>>attr(,"contrasts")$TBS
>>[1] "contr.treatment"
>>
>>contrast matrix
>>
>> > cont.matrix
>>       Diff
>>C.M.1   -1
>>C.F.2   -1
>>C.M.3   -1
>>C.F.3   -1
>>R.M.1    1
>>R.F.2    1
>>R.M.3    1
>>R.F.3    1
>>
>>What could be the possible reasons for the error and how to fix that?
>>
>>Thanks
>>Johnny
>
>Dear Johnny,
>
>I have to tell you that what you're doing, i.e., the design matrix
>you've created, is not very sensible statistically. Hence the
>non-useful results you are getting from limma. Here are some steps
>that you can take to do something about it:
>
>1. Consult someone with statistical experience at your organization
>who can tell you about replication and degrees of freedom for error.
>
>2. To get meaningful help from this list, you need to explain a little
>more about your experiment. In particular you need to explain what you
>are hoping to learn scientifically from your data and what comparisons
>are of interest to you.
>
>3. Explain what documentation you have read and what examples you are
>attempting to follow here. That would help us understand what you need
>to know, and may also help us to improve the documentation.
>
>Best wishes
>Gordon


Dear Johnny,

Thanks for supplying the extra information about your experiment and
about the documentation you are trying to follow.  I'm sorry though
that you're still not using statistical advice from your own group.

I note that Naomi Altman has already given you sensible advice on your experiment:
https://www.stat.math.ethz.ch/pipermail/bioconductor/2005-November/010773.html
Have you tried to follow her advice?

Following Naomi's suggestion, you could use the block argument in lmFit()
as in Section 8.2 of the Limma User's Guide:

 design <- model.matrix(~factor(Treatment), data=targets)
 corfit <- duplicateCorrelation(eset,design,block=targets$Sib)
 corfit$consensus
 fit <- lmFit(eset,design,block=targets$Sib,correlation=corfit$consensus)
 fit <- eBayes(fit)
 topTable(fit,coef=2)

This tests for a treatment difference while treating the Sib pairs as random effects.  A somewhat
simpler approach would be compute a moderated paired t-test:

 design <- model.matrix(~factor(Sib)+factor(Treatment), data=targets)
 fit <- lmFit(set, design)
 fit <- eBayes(fit)
 topTable(fit,coef=4)

Thanks for pointing out that you tried to follow the factorial example.
This does resemble your example in a way, but there is a critical
difference.  In the factorial design, the interaction terms is of
principle interest.  In your example, the Sib pairs are a blocking
variable, not a treatment factor, and you want to ignore the
interaction.

Also, you didn't actually follow the factorial example.  The factorial
example explains how to fit a model with four coefficients, because
there are four unique treatment combinations.  There are more than four
arrays, so this leaves some degrees of freedom over to estimate the
standard deviation.  You however created a distinct treatment name for
every single array and hence fitted a model with a coefficient for
every array, leaving no residual degrees freedom.  If you don't
understand why this is wrong, you need to consult a statistician in your
own group.  Are you asking your statistician the right questions?  Most
statisticians would be happy to oblige if you asked them a well posed
question like "I've fitted a linear model with no residual degrees of
freedom--what does this mean" or "how do I use a linear model program
to calculate a paired t-test".  They don't need to know anything about
microarrays to help you with things like this.

Hope this helps
Gordon



More information about the Bioconductor mailing list