[R] Correlate rows of 2 matrices
arun
smartpink111 at yahoo.com
Mon Sep 23 23:55:24 CEST 2013
Hi Ira,
I tried the ?lapply(). Looks like it edges the ?for() loop.
For e.g.
set.seed(435)
m1 <- matrix(rnorm(2000*30), ncol=30)
m2 <- matrix(rnorm(2000*30), ncol= 30)
corsP<-vector()
system.time({for(i in 1:2000) corsP[i] = cor(m1[i,], m2[i,])})
# user system elapsed
# 0.124 0.000 0.122
system.time({corsP2<- unlist(lapply(1:2000,function(i) cor(m1[i,],m2[i,])))})
# user system elapsed
# 0.108 0.000 0.110
identical(corsP,corsP2)
#[1] TRUE
system.time(corsP3<- diag(cor(t(m1),t(m2))))
# user system elapsed
# 0.272 0.004 0.276
mNew<- rbind(m1,m2)
indx<-rep(seq(nrow(mNew)/2),2)
system.time({corsP4<- tapply(seq_along(indx),list(indx),FUN=function(x) cor(t(mNew[x,]),t(mNew[x,]))[2])})
# user system elapsed
# 0.156 0.000 0.160
attr(corsP4,"dimnames")<- NULL
all.equal(corsP,as.vector(corsP4))
#[1] TRUE
A.K.
________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com>
Sent: Monday, September 23, 2013 5:45 PM
Subject: Re: Correlate rows of 2 matrices
Arun,
What department are you in? Are you on LinkedIn?
The loop takes about a second. I do not know how to use lapply/sapply with more than one object and a function of two variables such as cor().
When there are 2,000 columns it cannot be right to compute 4,000,000 correlations in order to use the 2,000 that are along the diagonal.
Ira
On 9/23/2013 2:12 PM, arun wrote:
Ira, I work as a postdoc at Wayne State Univ. in Detroit. I didn't check the speed of ?diag(). It could be a bit slower because it first computes the whole correlation and then take the diagonal elements. In that respect, loop will save the time. Would be worth checking whether ?lapply() improves the speed compared to ?for(). Arun ________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, September 23, 2013 4:42 PM
Subject: Re: Correlate rows of 2 matrices Arun, On a contract, I work for this San Francisco firm. But I work from home. http://www.manifoldpartners.com/Home.html How about yourself? Where are you located? Incidentally for my large matrix in addition to computing the pearson correlation matrix with use = "pairwise.complete.obs" (85 seconds), I also have to do spearman calculations. The code ran for 27 minutes. I only need about 2000 correlations, but I am computing 2000* 2000 correlations. Using a loop reduced the time to about 1 second Please note that this initial data set is one of the smaller ones I will be working on. Ira
On 9/23/2013 11:54 AM, arun wrote: Hi Ira,
Glad it worked for you. I would also choose the one you selected.
BTW, where do you work?
Regards,
Arun ________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, September 23, 2013 2:47 PM
Subject: Re: Correlate rows of 2 matrices Arun, Thanks for your help. I am very impressed with your ability to string together functions in order to achieve a desired result. On the other hand I prefer simplicity and I will have to explain my code to my boss who might have to eventually modify my code after I’ve moved on. I decided to go with your first option. It worked quite well.
diag(cor(t(m1),t(m2))) Thanks again. Ira
On 9/22/2013 6:57 PM, Ira Sharenow wrote: Arun,
>
>I have a new problem for you. I have two data frames (or matrices) and row by row I want to take the correlations. So if I have a 3 row by 10 column matrix, I would produce 3 correlations. Is there a way to merge the matrices and then use some sort of split? Ideas/solutions much appreciated. m1 = matrix(rnorm(30), nrow = 3)
m2 = matrix(rnorm(30), nrow = 3)
>set.seed(22)
>m1 = matrix(rnorm(30), nrow = 3)
m2 = matrix(rnorm(30), nrow = 3)
for(i in 1:3) corsP[i] = cor(m1[i,], m2[i,])
corsP
>[1] -0.50865019 -0.27760046 0.01423144
>Thanks. Ira
More information about the R-help
mailing list