[R] vectorization instead of using loop
Richard.Cotton at hsl.gov.uk
Richard.Cotton at hsl.gov.uk
Thu Oct 9 18:11:06 CEST 2008
> I've sent this question 2 days ago and got response from Sarah. Thanks
for
> that. But unfortunately, it did not really solve our problem. The main
issue
> is that we want to use our own (manipulated) covariance matrix in the
> calculation of the mahalanobis distance. Does anyone know how to
vectorize
> the below code instead of using a loop (which slows it down)?
> I'd really appreciate any help on this, thank you all in advance!
> Cheers,
> Frank
>
> This is what I posted 2 days ago:
> We have a data frame x with n people as rows and k variables as columns.
> Now, for each person (i.e., each row) we want to calculate a distance
> between him/her and EACH other person in x. In other words, we want to
> create a n x n matrix with distances (with zeros in the diagonal).
> However, we do not want to calculate Euclidian distances. We want to
> calculate Mahalanobis distances, which take into account the covariance
> among variables.
> Below is the piece of code we wrote ("covmat" in the function below is
the
> variance-covariance matrix among variables in Data that has to be fed
into
> mahalonobis function we are using).
> mahadist = function(x, covmat) {
> dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
> for (i in 1:nrow(x)) {
> dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]),
covmat)^.5
> }
> return(dismat)
> }
>
> This piece of code works, but it is very slow. We were wondering if it's
at
> all possible to somehow vectorize this function. Any help would be
greatly
> appreciated.
You can save a substantial time by calling as.matrix before the loop, e.g.
x <- data.frame(runif(1000), runif(1000), runif(1000))
covmat <- cov(x)
mahadist = function(x, covmat) #yours
{
dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
for (i in 1:nrow(x))
{
dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]),
covmat)^.5
}
return(dismat)
}
mahadist2 <- function(x, covmat) #my modification
{
n <- nrow(x)
dismat <- matrix(0,ncol=n,nrow=n)
matx <- as.matrix(x)
for (i in 1:n)
{
dismat[i,] <- mahalanobis(matx, matx[i,], covmat)^.5
}
dismat
}
system.time(mahadist(x, covmat))
# user system elapsed
# 2.82 0.06 2.95
system.time(mahadist2(x, covmat))
# user system elapsed
# 1.39 0.04 1.45
Regards,
Richie.
Mathematical Sciences Unit
HSL
------------------------------------------------------------------------
ATTENTION:
This message contains privileged and confidential inform...{{dropped:20}}
More information about the R-help
mailing list