[BioC] limma vs t-test

Gordon K Smyth smyth at wehi.EDU.AU
Sat May 24 11:06:29 CEST 2014


> Date: Thu, 22 May 2014 16:24:53 -0500
> From: Giovanni Bucci <giovannifbucci at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] limma vs t-test
>
> Hi everybody,
>
> I have a question regarding comparing limma to the t-test. I compared the p
> values obtained from both algorithms using 100 samples in each condition.
> Given the large number of observations my expectation was to see high
> correlation between the p values. As shown below, I ran the same code in
> three different conditions for the mean and standard deviation. The mean
> and standard deviation loaded from the google docs files are from a real
> micro array experiment.
>
> 1. mean and std from microarray exeperiment: no correlation between p values
>
> 2. constant fold change and std from microarray exeperiment: very small
> correlation
>
> 3. constant fold change and uniform std from 0.01 to 0.2: high correlation
>
> I understand that limma uses information across genes, but shouldn't this
> information be weighed with the number of observations for each condition?

limma does take account of the number of observations in each condition.

Gordon

> Thank you,
>
> Giovanni
>
> library(RCurl)
> library(genefilter)
> library(limma)
>
> setwd(tempdir())
>
>
> ##
> https://drive.google.com/file/d/0B__nP63GoFhMd042V09GcHFvVFE/edit?usp=sharing
> x = getBinaryURL("
> https://docs.google.com/uc?export=download&id=0B__nP63GoFhMd042V09GcHFvVFE",
> followlocation = TRUE, ssl.verifypeer = FALSE)
> writeBin(x, "std_var.txt", useBytes = TRUE)
> std_var =  as.matrix(read.table("std_var.txt",
> header = FALSE, sep = "\t", quote="\"", comment.char = ""
> ))
>
> ##
> https://drive.google.com/file/d/0B__nP63GoFhMc2tFT3hXVW52Q0E/edit?usp=sharing
> x = getBinaryURL("
> https://docs.google.com/uc?export=download&id=0B__nP63GoFhMc2tFT3hXVW52Q0E",
> followlocation = TRUE, ssl.verifypeer = FALSE)
> writeBin(x, "mean_var.txt", useBytes = TRUE)
> mean_var = as.matrix(read.table("mean_var.txt",
> header = FALSE, sep = "\t", quote="\"", comment.char = ""
> ))
>
>
> NO_OF_GROUPS = 2
>
> NO_OF_SAMP_PER_GROUP_GROUND_TRUTH = 100
>
> NO_OF_GENES = NROW(mean_var)
>
> gxexprs_GroundTruth = matrix(0, NO_OF_GENES,
> NO_OF_GROUPS*NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)
>
>
> std_var_list = list(std_var, std_var,seq(.01,.2, length.out= NO_OF_GENES))
>
> simulated_mean = matrix(c(8,9),NO_OF_GENES,NO_OF_GROUPS, byrow=TRUE)
>
> mean_var_list = list(mean_var, simulated_mean, simulated_mean)
>
> main_list = list("micorarray mean and std", "means 8 and 9 and micorarray
> std",  "means 8 and 9 and uniform std from 0.01 to 0.2")
>
> for(count_i in 1:3)
> {
>
> mean_var = mean_var_list[[count_i]]
> std_var = std_var_list[[count_i]]
>
>
> for(i in 1:NO_OF_GENES)
> {
> if ((i%%1000)==1)
> {
> cat(i, "\n")
> }
> for(j in 1:NO_OF_GROUPS)
> {
> gxexprs_GroundTruth[i, NO_OF_SAMP_PER_GROUP_GROUND_TRUTH*(j-1) +
> 1:NO_OF_SAMP_PER_GROUP_GROUND_TRUTH]=
> rnorm(NO_OF_SAMP_PER_GROUP_GROUND_TRUTH, mean_var[i,j], std_var[i])
> }
> }
>
> final_choice = c(1,2)
>
> Group = factor(sprintf("S%02d",
> rep(final_choice,each=NO_OF_SAMP_PER_GROUP_GROUND_TRUTH)
> ))
>
> t_test_pvals = rowttests(gxexprs_GroundTruth, fac=Group)$p.value
>
>
> design <- model.matrix(~0+Group)
> colnames(design) <- gsub("Group","",colnames(design))
> contrastsX = sprintf("S01 - S%02d",2)
>
> fit <- lmFit(gxexprs_GroundTruth, design)
>
> contrast.matrix <- makeContrasts(contrasts=contrastsX,levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
>
> TTable = topTable(fit2, coef=1, adjust="fdr", number =
> nrow(gxexprs_GroundTruth))
> limma_pvals = TTable$P.Value[as.numeric(rownames(TTable))]
>
> dev.new()
> plot(-log10(limma_pvals), -log10(t_test_pvals), main=main_list[[count_i]])
>
> }

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list