[R] avoid a loop
Joshua Wiley
jwiley.psych at gmail.com
Thu Nov 4 21:57:24 CET 2010
And to wrap it up and help you choose, here are four functions based
on these emails (the first one is my own slight variant):
library(ecodist)
a <- sample(1:1000, 10^4, replace = TRUE)
b <- sample(letters[1:6], 10^4, replace = TRUE)
foo1 <- function() {
x <- table(a, b)
return(x %*% t(x))
}
foo2 <- function() {
x <- crosstab(a, b, rep(1, length(a)))
return(x %*% t(x))
}
foo3 <- function() {
sapply(1:1000, function(y) {
sapply(1:1000, function(x) {
length(intersect(b[a==y], b[a==x]))
})
})
}
foo4 <- function() {crossprod(t(as.matrix(table(a, b))))}
> system.time(x1 <- foo1())
user system elapsed
0.028 0.008 0.038
> system.time(x2 <- foo2())
user system elapsed
0.076 0.008 0.087
## I got tired of waiting
> system.time(x3 <- foo3())
<menu-bar> <signals> <break>
Timing stopped at: 104.951 1.336 110.909
> system.time(x4 <- foo4())
user system elapsed
0.024 0.020 0.043
> all.equal(x1, x2, check.attributes = FALSE)
[1] TRUE
> all.equal(x1, x4, check.attributes = FALSE)
[1] TRUE
This suggests the speeds are:
foo1 < foo4 < foo2 < foo3
Cheers,
Josh
On Thu, Nov 4, 2010 at 12:42 PM, cory n <corynissen at gmail.com> wrote:
> Let's suppose I have userids and associated attributes... columns a and b
>
> a <- c(1,1,1,2,2,3,3,3,3)
> b <- c("a","b","c","a","d","a", "b", "e", "f")
>
> so a unique list of a would be
>
> id <- unique(a)
>
> I want a matrix like this...
>
> [,1] [,2] [,3]
> [1,] 3 1 2
> [2,] 1 2 1
> [3,] 2 1 4
>
> Where element i,j is the number of items in b that id[i] and id[j] share...
>
> So for example, in element [1,3] of the result matrix, I want to see
> 2. That is, id's 1 and 3 share two common elements in b, namely "a"
> and "b".
>
> This is hard to articulate, so sorry for the terrible description
> here. The way I have solved it is to do a double loop, looping over
> every member of the id column and comparing it to every other member
> of id to see how many elements of b they share. This takes forever.
>
> Thanks
>
> cn
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list