[BioC] t-test vs limma

Giovanni Bucci [guest] guest at bioconductor.org
Fri May 23 00:16:38 CEST 2014


Hi everybody,

I have a question regarding comparing limma to the t-test. I compared the p values obtained from both algorithms using 100 samples in each condition. Given the large number of observations my expectation was to see high correlation between the p values. As shown below, I ran the same code in three different conditions for the mean and standard deviation. The mean and standard deviation loaded from the google docs files are from a real micro array experiment.

1. mean and std from microarray exeperiment: no correlation between p values

2. constant fold change and std from microarray exeperiment: very small correlation

3. constant fold change and uniform std from 0.01 to 0.2: high correlation

I understand that limma uses information across genes, but shouldn't this information be weighed with the number of observations for each condition?

Thank you,

Giovanni

library(RCurl)
library(genefilter)
library(limma)

setwd(tempdir())


##https://drive.google.com/file/d/0B__nP63GoFhMd042V09GcHFvVFE/edit?usp=sharing
x = getBinaryURL("https://docs.google.com/uc?export=download&id=0B__nP63GoFhMd042V09GcHFvVFE", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, "std_var.txt", useBytes = TRUE)
std_var =  as.matrix(read.table("std_var.txt",
header = FALSE, sep = "t", quote=""", comment.char = ""
))

##https://drive.google.com/file/d/0B__nP63GoFhMc2tFT3hXVW52Q0E/edit?usp=sharing
x = getBinaryURL("https://docs.google.com/uc?export=download&id=0B__nP63GoFhMc2tFT3hXVW52Q0E", followlocation = TRUE, ssl.verifypeer = FALSE)
writeBin(x, "mean_var.txt", useBytes = TRUE)
mean_var = as.matrix(read.table("mean_var.txt",
header = FALSE, sep = "t", quote=""", comment.char = ""
))


NO_OF_GROUPS = 2

NO_OF_SAMP_PER_GROUP_GROUND_TRUTH = 100

NO_OF_GENES = NROW(mean_var)

gxexprs_GroundTruth = matrix(0, NO_OF_GENES, NO_OF_GROUPS*NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)


std_var_list = list(std_var, std_var,seq(.01,.2, length.out= NO_OF_GENES))

simulated_mean = matrix(c(8,9),NO_OF_GENES,NO_OF_GROUPS, byrow=TRUE)

mean_var_list = list(mean_var, simulated_mean, simulated_mean)

main_list = list("micorarray mean and std", "means 8 and 9 and micorarray std",  "means 8 and 9 and uniform std from 0.01 to 0.2")

for(count_i in 1:3)
{

mean_var = mean_var_list[[count_i]]
std_var = std_var_list[[count_i]]


for(i in 1:NO_OF_GENES)
{
	if ((i%%1000)==1)
	{
		cat(i, "n")
	}
	for(j in 1:NO_OF_GROUPS)
	{
		gxexprs_GroundTruth[i, NO_OF_SAMP_PER_GROUP_GROUND_TRUTH*(j-1) + 1:NO_OF_SAMP_PER_GROUP_GROUND_TRUTH]=
		rnorm(NO_OF_SAMP_PER_GROUP_GROUND_TRUTH, mean_var[i,j], std_var[i])
	}
}

final_choice = c(1,2)

Group = factor(sprintf("S%02d",
rep(final_choice,each=NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)
))

t_test_pvals = rowttests(gxexprs_GroundTruth, fac=Group)$p.value



design <- model.matrix(~0+Group)
colnames(design) <- gsub("Group","",colnames(design))
contrastsX = sprintf("S01 - S%02d",2)

#fit <- lmFit(gxexprs_GroundTruth[, final_col_dest], design)
fit <- lmFit(gxexprs_GroundTruth, design)

contrast.matrix <- makeContrasts(contrasts=contrastsX,levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

TTable = topTable(fit2, coef=1, adjust="fdr", number = nrow(gxexprs_GroundTruth))
limma_pvals = TTable$P.Value[as.numeric(rownames(TTable))]


dev.new()
plot(-log10(limma_pvals), -log10(t_test_pvals), main=main_list[[count_i]])

}



 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
 [1] grDevices datasets  tcltk     splines   graphics  utils     stats
 [8] grid      methods   base

other attached packages:
 [1] limma_3.14.4       genefilter_1.40.0  Biobase_2.18.0     BiocGenerics_0.4.0
 [5] RCurl_1.95-4.1     bitops_1.0-6       reshape2_1.2.2     Hmisc_3.14-3
 [9] Formula_1.1-1      survival_2.37-7    lattice_0.20-29

loaded via a namespace (and not attached):
 [1] annotate_1.36.0      AnnotationDbi_1.20.7 cluster_1.15.2
 [4] DBI_0.2-7            IRanges_1.16.6       latticeExtra_0.6-26
 [7] parallel_2.15.2      plyr_1.8             RColorBrewer_1.0-5
[10] RSQLite_0.11.4       stats4_2.15.2        stringr_0.6.2
[13] XML_3.98-1.1         xtable_1.7-3

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list