[BioC] limma vs t-test
Giovanni Bucci [guest]
guest at bioconductor.org
Fri May 23 00:09:30 CEST 2014
Hi everybody,
I have a question regarding comparing limma to the t-test. I compared the p values obtained from both algorithms using 100 samples in each condition. Given the large number of observations my expectation was to see high correlation between the p values. As shown below, I ran the same code in three different conditions for the mean and standard deviation. The mean and standard deviation loaded from the google docs files are from a real micro array experiment.
1. mean and std from microarray exeperiment: no correlation between p values
2. constant fold change and std from microarray exeperiment: very small correlation
3. constant fold change and uniform std from 0.01 to 0.2: high correlation
I understand that limma uses information across genes, but shouldn't this information be weighed with the number of observations for each condition?
Thank you,
Giovanni
library(RCurl)
library(genefilter)
library(limma)
setwd(tempdir())
##https://drive.google.com/file/d/0B__nP63GoFhMd042V09GcHFvVFE/edit?usp=sharing
x = getBinaryURL("https://docs.google.com/uc?export=download&id=0B__nP63GoFhMd042V09GcHFvVFE", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, "std_var.txt", useBytes = TRUE)
std_var = as.matrix(read.table("std_var.txt",
header = FALSE, sep = "t", quote=""", comment.char = ""
))
##https://drive.google.com/file/d/0B__nP63GoFhMc2tFT3hXVW52Q0E/edit?usp=sharing
x = getBinaryURL("https://docs.google.com/uc?export=download&id=0B__nP63GoFhMc2tFT3hXVW52Q0E", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, "mean_var.txt", useBytes = TRUE)
mean_var = as.matrix(read.table("mean_var.txt",
header = FALSE, sep = "t", quote=""", comment.char = ""
))
NO_OF_GROUPS = 2
NO_OF_SAMP_PER_GROUP_GROUND_TRUTH = 100
NO_OF_GENES = NROW(mean_var)
gxexprs_GroundTruth = matrix(0, NO_OF_GENES, NO_OF_GROUPS*NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)
std_var_list = list(std_var, std_var,seq(.01,.2, length.out= NO_OF_GENES))
simulated_mean = matrix(c(8,9),NO_OF_GENES,NO_OF_GROUPS, byrow=TRUE)
mean_var_list = list(mean_var, simulated_mean, simulated_mean)
main_list = list("micorarray mean and std", "means 8 and 9 and micorarray std", "means 8 and 9 and uniform std from 0.01 to 0.2")
for(count_i in 1:3)
{
mean_var = mean_var_list[[count_i]]
std_var = std_var_list[[count_i]]
for(i in 1:NO_OF_GENES)
{
if ((i%%1000)==1)
{
cat(i, "n")
}
for(j in 1:NO_OF_GROUPS)
{
gxexprs_GroundTruth[i, NO_OF_SAMP_PER_GROUP_GROUND_TRUTH*(j-1) + 1:NO_OF_SAMP_PER_GROUP_GROUND_TRUTH]=
rnorm(NO_OF_SAMP_PER_GROUP_GROUND_TRUTH, mean_var[i,j], std_var[i])
}
}
final_choice = c(1,2)
Group = factor(sprintf("S%02d",
rep(final_choice,each=NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)
))
t_test_pvals = rowttests(gxexprs_GroundTruth, fac=Group)$p.value
design <- model.matrix(~0+Group)
colnames(design) <- gsub("Group","",colnames(design))
contrastsX = sprintf("S01 - S%02d",2)
#fit <- lmFit(gxexprs_GroundTruth[, final_col_dest], design)
fit <- lmFit(gxexprs_GroundTruth, design)
contrast.matrix <- makeContrasts(contrasts=contrastsX,levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
TTable = topTable(fit2, coef=1, adjust="fdr", number = nrow(gxexprs_GroundTruth))
limma_pvals = TTable$P.Value[as.numeric(rownames(TTable))]
dev.new()
plot(-log10(limma_pvals), -log10(t_test_pvals), main=main_list[[count_i]])
}
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grDevices datasets tcltk splines graphics utils stats
[8] grid methods base
other attached packages:
[1] limma_3.14.4 genefilter_1.40.0 Biobase_2.18.0 BiocGenerics_0.4.0
[5] RCurl_1.95-4.1 bitops_1.0-6 reshape2_1.2.2 Hmisc_3.14-3
[9] Formula_1.1-1 survival_2.37-7 lattice_0.20-29
loaded via a namespace (and not attached):
[1] annotate_1.36.0 AnnotationDbi_1.20.7 cluster_1.15.2
[4] DBI_0.2-7 IRanges_1.16.6 latticeExtra_0.6-26
[7] parallel_2.15.2 plyr_1.8 RColorBrewer_1.0-5
[10] RSQLite_0.11.4 stats4_2.15.2 stringr_0.6.2
[13] XML_3.98-1.1 xtable_1.7-3
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list