[BioC] Question about amount of diff expressed genes in fit using a design matrix with intercept
James W. MacDonald
jmacdon at uw.edu
Sat Nov 24 16:00:13 CET 2012
Hi Belisa,
On 11/24/2012 9:23 AM, Belisa Santos [guest] wrote:
> Hello everybody,
>
> I have used a design matrix with a control as intercept. When I fit the matrix with my RMA normalized expression set, do eBayes, and then ask for a topTable:
> topTable(fit, adjust="BH", p.value=1e-5, lfc=1, number=Inf)
> I get the same number of genes as I have input (54675)... and the lowest adjusted p-value is e-29... Now it cannot be the case where ALL my genes are differentially expressed between control and treatments....
But that isn't what you are testing with your call to topTable(). You
are asking if *any* coefficient is not equal to zero, and your intercept
will almost always (or in your case, always) be different from zero.
>
> I am missing something? The intercept functions as a "contrast", right? Is this common? Could I be doing something wrong? Please help me...
No, the intercept functions as a baseline, to which all other samples
are compared. The intercept is estimating the mean expression of the
control samples, and all the other coefficients are estimating the
difference between a given sample and the control (intercept).
I am not sure what you expected to get from calling topTable() without
specifying a coefficient, but what it does is compute an F-test for all
coefficients. Which is testing the null hypothesis that all coefficients
= 0. This will almost never be true of an intercept, as it is measuring
the mean expression of the controls, which will usually be larger than
zero (hence all significant in your topTable).
You might want to know which genes are significant in one or more of
your coefficients, in which case you would do
topTable(fit, 2:17)
Otherwise, you usually want to specify just one or a few coefficients to
test at one time, because knowing that a gene is significant in one or
more of 16 different comparisons is not really that helpful.
Best,
Jim
>
> Thank you in advance for you kind help. All the best,
>
> Belisa
>
>
> -- output of sessionInfo():
>
> # Code I am using,
> cel<-ReadAffy
>
> expression_set<- rma(cel)
>
> fit<- lmFit(expression_set, design.ctrl_intercept)
> fit<- eBayes(fit)
> topTable(fit, adjust.method="BH", p.value=1e-5, lfc=1, number=Inf)
>
> # My design matrix
>
> ctrl0 B1.ctrl BT1.ctrl BI1.ctrl BTI1.ctrl B2.ctrl BT2.ctrl BI2.ctrl BTI2.ctrl B3.ctrl BT3.ctrl BI3.ctrl BTI3.ctrl B7.ctrl BT7.ctrl BI7.ctrl BTI7.ctrl
> 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 7 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 8 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 9 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 10 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 11 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 12 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 13 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
> 14 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
> 15 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
> (...)
> 52 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
> 53 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
> 54 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list