[Rd] Fastest non-overlapping binning mean function out there?
Martin Morgan
mtmorgan at fhcrc.org
Wed Oct 3 15:47:11 CEST 2012
On 10/02/2012 05:19 PM, Henrik Bengtsson wrote:
> Hi,
>
> I'm looking for a super-duper fast mean/sum binning implementation
> available in R, and before implementing z = binnedMeans(x y) in native
> code myself, does any one know of an existing function/package for
> this? I'm sure it already exists. So, given data (x,y) and B bins
> bx[1] < bx[2] < ... < bx[B] < bx[B+1], I'd like to calculate the
> binned means (or sums) 'z' such that z[1] = mean(x[bx[1] <= x & x <
> bx[2]]), z[2] = mean(x[bx[2] <= x & x < bx[3]]), .... z[B]. Let's
> assume there are no missing values and 'x' and 'bx' is already
> ordered. The length of 'x' is in the order of 10,000-millions. The
> number of elements in each bin vary.
since x and bx are ordered (sorting x would be expensive), the C code seems
pretty straight-forward and memory-efficient -- create a result vector as long
as bx, then iterate through x accumulating n and it's sum until x[i] >= bx[i].
(I think R's implementation of mean() actually pays more attention to numerical
error, but with this much data...)
library(inline)
binmean <- cfunction(signature(x="numeric", bx="numeric"),
" int nx = Rf_length(x), nb = Rf_length(bx), i, j, n;
SEXP ans = PROTECT(NEW_NUMERIC(nb));
double sum, *xp = REAL(x), *bxp = REAL(bx), *ansp = REAL(ans);
sum = j = n = 0;
for (i = 0; i < nx; ++i) {
while (xp[i] >= bxp[j]) {
ansp[j++] = n > 0 ? sum / n : 0;
sum = n = 0;
}
n += 1;
sum += xp[i];
}
ansp[j] = n > 0 ? sum / n : 0;
UNPROTECT(1);
return ans;
")
with a test case
nx <- 4e7
nb <- 1e3
x <- sort(runif(nx))
bx <- do.call(seq, c(as.list(range(x)), length.out=nb))
leading to
> bx1 <- c(bx[-1], bx[nb] + 1)
> system.time(res <- binmean(x, bx1))
user system elapsed
0.052 0.000 0.050
Martin
>
> Thanks,
>
> Henrik
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-devel
mailing list