[R] Lookups in R
Michael Frumin
michael at frumin.net
Thu Jul 5 11:56:20 CEST 2007
the problem I have is that userid's are not just sequential from
1:n_users. if they were, of course I'd have made a big matrix that was
n_users x n_fields and that would be that. but, I think what I cando is
just use the hash to store the index into the result matrix, nothing
more. then the rest of it will be easy.
but please tell me more about eliminating loops. In many cases in R I
have used lapply and derivatives to avoid loops, but in this case they
seem to give me extra overhead simply by the generation of their result
lists:
> system.time(lapply(1:10^4, mean))
user system elapsed
1.31 0.00 1.31
> system.time(for(i in 1:10^4) mean(i))
user system elapsed
0.33 0.00 0.32
thanks,
mike
> I don't think that's a fair comparison--- much of the overhead comes
> from the use of data frames and the creation of the indexing vector. I
> get
>
> > n_accts <- 10^3
> > n_trans <- 10^4
> > t <- list()
> > t$amt <- runif(n_trans)
> > t$acct <- as.character(round(runif(n_trans, 1, n_accts)))
> > uhash <- new.env(hash=TRUE, parent=emptyenv(), size=n_accts)
> > for (acct in as.character(1:n_accts)) uhash[[acct]] <- list(amt=0, n=0)
> > system.time(for (i in seq_along(t$amt)) {
> + acct <- t$acct[i]
> + x <- uhash[[acct]]
> + uhash[[acct]] <- list(amt=x$amt + t$amt[i], n=x$n + 1)
> + }, gcFirst = TRUE)
> user system elapsed
> 0.508 0.008 0.517
> > udf <- matrix(0, nrow = n_accts, ncol = 2)
> > rownames(udf) <- as.character(1:n_accts)
> > colnames(udf) <- c("amt", "n")
> > system.time(for (i in seq_along(t$amt)) {
> + idx <- t$acct[i]
> + udf[idx, ] <- udf[idx, ] + c(t$amt[i], 1)
> + }, gcFirst = TRUE)
> user system elapsed
> 1.872 0.008 1.883
>
> The loop is still going to be the problem for realistic examples.
>
> -Deepayan
More information about the R-help
mailing list