[R] apply vs sapply vs loop - lm() call appl(y)ied on array
Christoph Lehmann
christoph.lehmann at gmx.ch
Thu Apr 21 17:07:31 CEST 2005
Ok thanks to a hint of Matthew to a former post with a similar request I
have now three faster solutions (see below), the last one being the
fastest, but the former two also faster than the for-loop,
apply(lm(formula)) and sapply(lm(formula)) versions in my last mail:
one problem only: using lsfit I can't get directly measures such as
r.squared ...
---------------
## using lm with a matrix response (recommended by BDR)
date()
rsq <-unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4])))
~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()
## using sapply and lsfit instead of lm (recommended by Kevin Wright)
date()
fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z <- sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2])
dim(z) <- d.dim[2:4]
date()
## using lsfit with a matrix response:
date()
rsq <-lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]))))$coef[2,]
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()
------------------
thanks
Christoph
Wiener, Matthew wrote:
> Christoph --
>
> There was just a thread on this earlier this week. You can search in the
> archives for the title: "refitting lm() with same x, different y".
>
> (Actually, it doesn't turn up in the R site search yet, at least for me.
> But if you just go to the archive of recent messages, available through
> CRAN, you can search on refitting and find it. The original post was from
> William Valdar, on April 19.)
>
> Hope this helps,
>
> Matt Wiener
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Christoph Lehmann
> Sent: Thursday, April 21, 2005 9:24 AM
> To: R-help at stat.math.ethz.ch
> Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array
>
>
> Dear useRs
>
> (Code of the now mentioned small example is below)
>
> I have 7 * 8 * 9 = 504 series of data (each length 5). For each of
> theses series I want to compute a lm(), where the designmatrx X is the
> same for all these computations.
>
> The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9)
> means, the first dimension holds the data-series.
>
> The lm computation needs performance optimization, since in fact the
> dimensions are much larger. I compared the following approaches:
>
> using a for-loop. using apply, and using sapply. All of these require
> roughly the same time of computation. I was astonished since I expected
> at least sapply to outperfomr the for-loop.
>
> Do you have me another solution, which is faster? many thanks
>
> here is the code
> ## ------------------------------------------------------
> t.length <- 5
> d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z
> Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1),
> d.dim)
> X <- c(1,3,2,4,5)
>
> ## -------- performance tests
> ## using for loop
> date()
> z <- rep(0, prod(d.dim[2:4]))
> l <- 0
> for (i in 1:dim(Y)[4])
> for (j in 1:dim(Y)[3])
> for (k in 1:dim(Y)[2]) {
> l <- l + 1
> z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared
> }
> date()
>
> ## using apply
> date()
> z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared)
> date()
>
> ## using sapply
> date()
> fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
> z <- sapply(split(as.vector(Y), fac), FUN = function(x)
> unlist(summary(lm(x ~ X)))$r.squared)
> dim(z) <- d.dim[2:4]
> date()
>
> ## ------------------------------------------------------
>
More information about the R-help
mailing list