[R] Processing logic for Huge Data set

Liaw, Andy andy_liaw at merck.com
Mon Oct 20 14:10:40 CEST 2003


> From: TyagiAnupam at aol.com [mailto:TyagiAnupam at aol.com] 
> 
> Loops are time consuming in R. Try one of the apply functions 
> for vectorized 
> calculations, like "apply", "lapply","sapply" or "tapply". 
> Also see help for 
> "split".

Have you actually compared for loop with apply, in terms of timing?  Have
you looked at the R code for apply()?  It has:

    <...>
    if (length(d.call) < 2) {
        if (length(dn.call)) 
            dimnames(newX) <- c(dn.call, list(NULL))
        for (i in 1:d2) ans[[i]] <- FUN(newX[, i], ...)
    }
    else for (i in 1:d2) ans[[i]] <- FUN(array(newX[, i], d.call, 
        dn.call), ...)
    <...>

Notice the for loop there!  While what you said about apply and for loop
might be true for (older version of) Splus, it's not true for R.

lapply() does do the looping at the C level.  sapply and tapply uses lapply,
so they can be faster than for loop at the R level.

Andy


> 
> In a message dated 10/19/03 5:25:51 PM Pacific Daylight Time, 
> Wanzare at HCJP.com writes:
> 
> > Hello All,
> >       I am new to R. I am trying to process this huge data set of 
> > matrix containing four columns, say x1, x2, x3, x4 and n number of 
> > rows.
> > 
> > I want to aggregate the matrix by x1 and perform statistic based on 
> > columns x2, x3, x4. I tried aggregate function but it gave 
> me memory 
> > allocation error (which I am not surprised), so I ended up 
> performing 
> > a for loop based on x1 and subsetting the matrix based on 
> x1. However 
> > I have a hunch that their should be a less expensive way of 
> doing this 
> > processing.  Any ideas or tips to optimize this processing 
> logic would 
> > be greatly appreciated.
> > 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>




More information about the R-help mailing list