[R] apply vs sapply vs loop - lm() call appl(y)ied on array

Christoph Lehmann christoph.lehmann at gmx.ch
Thu Apr 21 17:07:31 CEST 2005


Ok thanks to a hint of Matthew to a former post with a similar request I 
have now three faster solutions (see below), the last one being the 
fastest, but the former two also faster than the for-loop, 
apply(lm(formula)) and sapply(lm(formula)) versions in my last mail:

one problem only: using lsfit I can't get directly measures such as 
r.squared ...

---------------

## using lm with a matrix response (recommended by BDR)
date()
rsq <-unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4]))) 
~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()


## using sapply and lsfit instead of lm (recommended by Kevin Wright)
date()
fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z <- sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2])
dim(z) <- d.dim[2:4]
date()

## using lsfit with a matrix response:
date()
rsq <-lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]))))$coef[2,]
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()

------------------

thanks
Christoph

Wiener, Matthew wrote:
> Christoph --
> 
> There was just a thread on this earlier this week.  You can search in the
> archives for the title:   "refitting lm() with same x, different y".
> 
> (Actually, it doesn't turn up in the R site search yet, at least for me.
> But if you just go to the archive of recent messages, available through
> CRAN, you can search on refitting and find it.  The original post was from
> William Valdar, on April 19.)
> 
> Hope this helps,
> 
> Matt Wiener
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Christoph Lehmann
> Sent: Thursday, April 21, 2005 9:24 AM
> To: R-help at stat.math.ethz.ch
> Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array
> 
> 
> Dear useRs
> 
> (Code of the now mentioned small example is below)
> 
> I have 7 * 8 * 9 = 504 series of data (each length 5). For each of 
> theses series I want to compute a lm(), where the designmatrx X is the 
> same for all these computations.
> 
> The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9)
> means, the first dimension holds the data-series.
> 
> The lm computation needs performance optimization, since in fact the 
> dimensions are much larger. I compared the following approaches:
> 
> using a for-loop. using apply, and using sapply. All of these require 
> roughly the same time of computation. I was astonished since I expected 
> at least sapply to outperfomr the for-loop.
> 
> Do you have me another solution, which is faster? many thanks
> 
> here is the code
> ## ------------------------------------------------------
> t.length <- 5
> d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z
> Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), 
> d.dim)
> X <- c(1,3,2,4,5)
> 
> ## -------- performance tests
> ## using for loop
> date()
> z <- rep(0, prod(d.dim[2:4]))
> l <- 0
> for (i in 1:dim(Y)[4])
>   for (j in 1:dim(Y)[3])
>    for (k in 1:dim(Y)[2]) {
>      l <- l + 1
>      z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared
>    }
> date()
> 
> ## using apply
> date()
> z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared)
> date()
> 
> ## using sapply
> date()
> fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
> z <- sapply(split(as.vector(Y), fac), FUN = function(x) 
> unlist(summary(lm(x ~ X)))$r.squared)
> dim(z) <- d.dim[2:4]
> date()
> 
> ## ------------------------------------------------------
>




More information about the R-help mailing list