[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Thomas Lumley
tlumley at u.washington.edu
Wed Nov 26 21:37:03 CET 2008
You can do much better by doing the correlations as a matrix operation:
> system.time({
+ m1<-scale(m1)
+ m2<-scale(m2)
+ r<-crossprod(m1,m2)/100
+ df<-100
+ tstat<-sqrt(df)*r/sqrt(1-r^2)
+ p<-pt(tstat,df)
+ })
user system elapsed
0.025 0.004 0.028
There might be a factor of n/(n-1) missing somewhere, which would be
fixable if you could bring yourself to care about it.
-thomas
On Wed, 26 Nov 2008, Jorge Ivan Velez wrote:
> Hi Daren,
> Here is another aproach a little bit faster taking into account that I'm
> using your original matrices. My session info is at the end. I'm using a
> 2.4 GHz Core 2-Duo processor and 3 GB of RAM.
>
> # Data
> set.seed(123)
> m1 <- matrix(rnorm(100000), ncol=100)
> m2 <- matrix(rnorm(100000), ncol=100)
> colnames(m1)=paste('m1_',1:100,sep="")
> colnames(m2)=paste('m2_',1:100,sep="")
>
> # Combinations
> combs=expand.grid(colnames(m1),colnames(m2))
>
> # ---------------
> # Option 1
> #----------------
> system.time(apply(combs,1,function(x)
> cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1)
> # user system elapsed
> # 8.12 0.01 8.20
>
> # ---------------
> # Option 2
> #----------------
> require(Hmisc)
> system.time(apply(combs,1,function(x)
> rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2)
> # user system elapsed
> # 7.00 0.00 7.02
>
>
> HTH,
>
> Jorge
>
>
> # ------------- Session Info ----------------------------
> R version 2.8.0 Patched (2008-11-08 r46864)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>
> On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76 at hotmail.com> wrote:
>
>>
>> My two matrices are roughly the sizes of m1 and m2. I tried using two apply
>> and cor.test to compute the correlation p.values. More than an hour, and the
>> codes are still running. Please help to make it more efficient.
>>
>> m1 <- matrix(rnorm(100000), ncol=100)
>> m2 <- matrix(rnorm(10000000), ncol=100)
>>
>> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
>> cor.test(x,y)$p.value }) })
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list