[BioC] A question about Limma

Gordon Smyth smyth at wehi.edu.au
Fri Jan 7 01:42:17 CET 2005


40% sounds to me like a *lot* of genes. I keep it in. Not even the 
strongest effect will be significant for *every* gene. And non-significance 
doesn't mean the effect is zero.

Whether you keep a nuisance effect in also depends on the size of the 
experiment. With many arrays, definitely keep it in. With very few arrays 
the cost of estimating a nuisance parameter is relatively greater. Where's 
the cutoff? Don't know. Only experience will tell. I am currently thinking 
the cutoff for a dye-effect with two-color replicated dye-swap data is 
around 4 arrays, depending obviously on the technology.

Gordon

At 11:31 AM 7/01/2005, Fangxin Hong wrote:
>Is it possible that dye-effect is still tested to be significant for some
>genes, let's say 40% of genes? Do we remove or keep this effect for all
>genes?
>I met this problem,  the factor origin (like differnent laboratories)was
>significant for > 50% genes, what I did was keeping it in the model for
>all genes.
>However, I can't figure out a nice explanation of this, like why dye
>effect is only significant for 40% of genes, what does this tell us about
>this effect.
>
>Thanks.
>Fangxin
> > I agree.  In my reply to Fangxin I should have added that I would remove a
> > non-essential effect
> > like  a dye-effect if it appeared non-significant, but I'd remove it for
> > all the genes.
> >
> > Gordon
> >
> > On Tue, January 4, 2005 1:18 am, Naomi Altman said:
> >> Reducing the model based on removing nonsignificant effects is called
> >> "pre-test estimation".  It is known to increase the false-positive rate,
> >> even in the classical setting.  In the microarray setting, there is no
> >> compelling reason to use pre-test estimators that differ from gene to
> >> gene.
> >>
> >> --Naomi Altman
> >>
> >> At 10:57 PM 1/3/2005 +1100, Gordon K Smyth wrote:
> >>> > Date: Sun, 2 Jan 2005 14:05:15 -0800 (PST)
> >>> > From: "Fangxin Hong" <fhong at salk.edu>
> >>> > Subject: [BioC] A question about Limma
> >>> > To: bioconductor at stat.math.ethz.ch
> >>> > Message-ID: <1867.66.75.240.64.1104703515.squirrel at 66.75.240.64>
> >>> > Content-Type: text/plain;charset=iso-8859-1
> >>> >
> >>> > Hi Bioconductor users;
> >>> > I have a general question about limma model.
> >>> > In limma package, usually one linear model applies to all genes, and
> >>> error
> >>> > variances from all genes are modified simultaneously. What if some
> >>> > factors, for example, one main effect, is only significant for some
> >>> genes.
> >>> > Then if we want identify genes based on the significance of another
> >>> main
> >>> > effect (of interest). What is the best way to do it? Currently I juse
> >>> > leave this factor in the model which is applied to all genes,
> >>>
> >>>That's what I do, leave all terms in the models for all the genes.  I
> >>>don't see a strong case for
> >>>doing a separate model selection process for every gene.
> >>>
> >>> > but this
> >>> > might under-estimate the total number of genes on which the effect of
> >>> > interest is significant.
> >>>
> >>>Why do you think so?  The only disadvantage of keeping a non-significant
> >>>term in the model is a
> >>>reduction in residual degrees of freedom, with some consequential loss
> >>> of
> >>>power, but this
> >>>disadvantage is mitigated by the empirical Bayes moderation process.
> >>>
> >>>Perhaps someday someone will work out a model selection theory for
> >>>massively parallel regression
> >>>situations like microarray experiments, but there isn't such a theory
> >>>now.  It seems safer to me
> >>>to have the same model for every gene, keeping all the 'a priori'
> >>>important predictors in the
> >>>model.
> >>>
> >>>Gordon
> >>>
> >>> > I am sorry if this question has been asked/answered here before, I
> >>> > wouldn't find it through searching the archive. Any comment,
> >>> suggestion or
> >>> > experience is appreciated.
> >>> >
> >>> > Fangxin
> >>> > --
> >>> > Fangxin Hong, Ph.D.
> >>> > Plant Biology Laboratory
> >>> > The Salk Institute
> >>> > 10010 N. Torrey Pines Rd.
> >>> > La Jolla, CA 92037
> >>> > E-mail: fhong at salk.edu
> >>>
> >>>_______________________________________________
> >>>Bioconductor mailing list
> >>>Bioconductor at stat.math.ethz.ch
> >>>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>
> >> Naomi S. Altman                                814-865-3791 (voice)
> >> Associate Professor
> >> Bioinformatics Consulting Center
> >> Dept. of Statistics                              814-863-7114 (fax)
> >> Penn State University                         814-865-1348 (Statistics)
> >> University Park, PA 16802-2111
> >>
> >
> >
> >
>
>
>--
>Fangxin Hong, Ph.D.
>Plant Biology Laboratory
>The Salk Institute
>10010 N. Torrey Pines Rd.
>La Jolla, CA 92037
>E-mail: fhong at salk.edu



More information about the Bioconductor mailing list