[Rd] Any interest in "merge" and "by" implementations	specifically for sorted data?
    Seth Falcon 
    sfalcon at fhcrc.org
       
    Thu Jul 27 16:20:29 CEST 2006
    
    
  
"Kevin B. Hendricks" <kevin.hendricks at sympatico.ca> writes:
> My first R attempt was a simple
>
> # sort the data.frame gd and the sort key
> sorder <- order(MDPC)
> gd  <- gd[sorder,]
> MDPC <- MDPC[sorder]
> attach(gd)
>
> # find the length and sum for each unique sort key
> XN <- by(MVE, MDPC, length)
> XSUM <- by(MVE, MDPC, sum)
> GRPS <- levels(as.factor(MDPC))
>
> Well the ordering and sorting was reasonably fast but the first "by"  
> statement was still running 4 hours later on my machine (a dual 2.6  
> gig Opteron with 4 gig of main memory).  This same snippet of code in  
> SAS running on a slower machine takes about 5 minutes of system
> time.
I wonder if split() would be of use here.  Once you have sorted the
data frame gd and the sort keys MDPC, you could do:
gdList <- split(gd$MVE, MDPC)
xn <- sapply(gdList, length)
xsum <- sapply(gdList, sum)
+ seth
    
    
More information about the R-devel
mailing list