[BioC] A question about Limma

Fangxin Hong fhong at salk.edu
Fri Jan 7 01:31:25 CET 2005


Is it possible that dye-effect is still tested to be significant for some
genes, let's say 40% of genes? Do we remove or keep this effect for all
genes?
I met this problem,  the factor origin (like differnent laboratories)was
significant for > 50% genes, what I did was keeping it in the model for
all genes.
However, I can't figure out a nice explanation of this, like why dye
effect is only significant for 40% of genes, what does this tell us about
this effect.

Thanks.
Fangxin
> I agree.  In my reply to Fangxin I should have added that I would remove a
> non-essential effect
> like  a dye-effect if it appeared non-significant, but I'd remove it for
> all the genes.
>
> Gordon
>
> On Tue, January 4, 2005 1:18 am, Naomi Altman said:
>> Reducing the model based on removing nonsignificant effects is called
>> "pre-test estimation".  It is known to increase the false-positive rate,
>> even in the classical setting.  In the microarray setting, there is no
>> compelling reason to use pre-test estimators that differ from gene to
>> gene.
>>
>> --Naomi Altman
>>
>> At 10:57 PM 1/3/2005 +1100, Gordon K Smyth wrote:
>>> > Date: Sun, 2 Jan 2005 14:05:15 -0800 (PST)
>>> > From: "Fangxin Hong" <fhong at salk.edu>
>>> > Subject: [BioC] A question about Limma
>>> > To: bioconductor at stat.math.ethz.ch
>>> > Message-ID: <1867.66.75.240.64.1104703515.squirrel at 66.75.240.64>
>>> > Content-Type: text/plain;charset=iso-8859-1
>>> >
>>> > Hi Bioconductor users;
>>> > I have a general question about limma model.
>>> > In limma package, usually one linear model applies to all genes, and
>>> error
>>> > variances from all genes are modified simultaneously. What if some
>>> > factors, for example, one main effect, is only significant for some
>>> genes.
>>> > Then if we want identify genes based on the significance of another
>>> main
>>> > effect (of interest). What is the best way to do it? Currently I juse
>>> > leave this factor in the model which is applied to all genes,
>>>
>>>That's what I do, leave all terms in the models for all the genes.  I
>>>don't see a strong case for
>>>doing a separate model selection process for every gene.
>>>
>>> > but this
>>> > might under-estimate the total number of genes on which the effect of
>>> > interest is significant.
>>>
>>>Why do you think so?  The only disadvantage of keeping a non-significant
>>>term in the model is a
>>>reduction in residual degrees of freedom, with some consequential loss
>>> of
>>>power, but this
>>>disadvantage is mitigated by the empirical Bayes moderation process.
>>>
>>>Perhaps someday someone will work out a model selection theory for
>>>massively parallel regression
>>>situations like microarray experiments, but there isn't such a theory
>>>now.  It seems safer to me
>>>to have the same model for every gene, keeping all the 'a priori'
>>>important predictors in the
>>>model.
>>>
>>>Gordon
>>>
>>> > I am sorry if this question has been asked/answered here before, I
>>> > wouldn't find it through searching the archive. Any comment,
>>> suggestion or
>>> > experience is appreciated.
>>> >
>>> > Fangxin
>>> > --
>>> > Fangxin Hong, Ph.D.
>>> > Plant Biology Laboratory
>>> > The Salk Institute
>>> > 10010 N. Torrey Pines Rd.
>>> > La Jolla, CA 92037
>>> > E-mail: fhong at salk.edu
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Bioinformatics Consulting Center
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 (Statistics)
>> University Park, PA 16802-2111
>>
>
>
>


-- 
Fangxin Hong, Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu



More information about the Bioconductor mailing list