[BioC] limma small vs large number of samples
James W. MacDonald
jmacdon at uw.edu
Tue Apr 29 16:20:01 CEST 2014
Hi Giovanni,
On 4/28/2014 8:54 PM, Giovanni Bucci wrote:
> Hello everybody,
>
> I have 32 samples, 4 factors with 2 levels each. Each level has 2
> replicates.
>
>> str(gxexprs)
> num [1:15584, 1:32] 7.94 6.67 9.93 9.62 12.19 ...
>
>> Group
> [1] R52VQ R52VQ R52VE R52VE R52EQ R52EQ R52EE R52EE R95VQ R95VQ R95VQ R95VE
> [13] R95VE R95VE R95EQ R95EQ R95EQ R95EE R95EE R95EE R97VQ R97VQ R97VQ R97VE
> [25] R97VE R97VE R97EQ R97EQ R97EQ R97EE R97EE R97EE
> 16 Levels: R52VQ R52VE R52EQ R52EE R95VQ R95VE R95EQ R95EE R97VQ ... R97EE
>
>
> design <- model.matrix(~0+Group)
> fit <- lmFit(gxexprs, design)
>
> contrast.matrix <- makeContrasts(contrasts="R52VQ - R52VE",levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
>
> TTable = topTable(fit2)
>
> global_p_val = TTable$P.Val
>
>
> gxexprs = gxexprs[, 1:4]
>
>
> ## same code as above but the expression matrix has only the first 4
> columns which represent the contrast tested above
>
> design <- model.matrix(~0+Group)
> fit <- lmFit(gxexprs, design)
>
> contrast.matrix <- makeContrasts(contrasts="R52VQ - R52VE",levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
>
> TTable = topTable(fit2)
>
> local_p_val = TTable$P.Val
>
> local_p_val has much greater values than global_p_val even though they
> represent the same comparison.
>
> What is the explanation for this?
The denominator of your t-statistic is based on the mean square error of
the model (which is based on the intra-group variance of all groups).
When you have all the other groups in the model, the number of
observations used to estimate variance is larger, so you get more
degrees of freedom for you test (and the variance estimate is more
accurate), so you get smaller p-values in general.
Best,
Jim
>
> Can you point to some diagnostic functions that will show the difference?
>
> Thank you,
>
> Giovanni
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list