[Rd] Any interest in "merge" and "by" implementations specifically for sorted data?
Thomas Lumley
tlumley at u.washington.edu
Mon Jul 31 16:19:01 CEST 2006
On Sat, 29 Jul 2006, Kevin B. Hendricks wrote:
> Hi Bill,
>
>>>> sum : igroupSums
>
> Okay, after thinking about this ...
>
> # assumes i is the small integer factor with n levels
> # v is some long vector
> # no sorting required
>
> igroupSums <- function(v,i) {
> sums <- rep(0,max(i))
> for (j in 1:length(v)) {
> sums[[i[[j]]]] <- sums[[i[[j]]]] + v[[j]]
> }
> sums
> }
>
> if written in fortran or c might be faster than using split. It is
> at least just linear in time with the length of vector v.
For sums you should look at rowsum(). It uses a hash table in C and last
time I looked was faster than using split(). It returns a vector of the
same length as the input, but that would easily be fixed.
The same approach would work for min, max, range, count, mean, but not for
arbitrary functions.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-devel
mailing list