[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

Thomas Lumley tlumley at u.washington.edu
Wed Nov 26 21:37:03 CET 2008


You can do much better by doing the correlations as a matrix operation:
> system.time({
+ 	m1<-scale(m1)
+ 	m2<-scale(m2)
+ 	r<-crossprod(m1,m2)/100
+ 	df<-100
+ 	tstat<-sqrt(df)*r/sqrt(1-r^2)
+ 	p<-pt(tstat,df)
+ 	})
    user  system elapsed
   0.025   0.004   0.028

There might be a factor of n/(n-1) missing somewhere, which would be 
fixable if you could bring yourself to care about it.

 	-thomas



On Wed, 26 Nov 2008, Jorge Ivan Velez wrote:

> Hi Daren,
> Here is another aproach a little bit faster taking into account that I'm
> using your original matrices.  My session info is at the end. I'm using a
> 2.4 GHz Core 2-Duo processor and 3 GB of RAM.
>
> # Data
> set.seed(123)
> m1 <- matrix(rnorm(100000), ncol=100)
> m2 <- matrix(rnorm(100000), ncol=100)
> colnames(m1)=paste('m1_',1:100,sep="")
> colnames(m2)=paste('m2_',1:100,sep="")
>
> # Combinations
> combs=expand.grid(colnames(m1),colnames(m2))
>
> # ---------------
> # Option 1
> #----------------
> system.time(apply(combs,1,function(x)
> cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1)
> #  user  system elapsed
> #   8.12    0.01    8.20
>
> # ---------------
> # Option 2
> #----------------
> require(Hmisc)
> system.time(apply(combs,1,function(x)
> rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2)
> #   user  system elapsed
> #   7.00    0.00    7.02
>
>
> HTH,
>
> Jorge
>
>
> # -------------  Session Info ----------------------------
> R version 2.8.0 Patched (2008-11-08 r46864)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76 at hotmail.com> wrote:
>
>>
>> My two matrices are roughly the sizes of m1 and m2. I tried using two apply
>> and cor.test to compute the correlation p.values. More than an hour, and the
>> codes are still running. Please help to make it more efficient.
>>
>> m1 <- matrix(rnorm(100000), ncol=100)
>> m2 <- matrix(rnorm(10000000), ncol=100)
>>
>> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
>> cor.test(x,y)$p.value }) })
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list