# [Rd] Fastest non-overlapping binning mean function out there?

Hervé Pagès hpages at fhcrc.org
Wed Oct 3 03:11:14 CEST 2012

```Hi Henrik,

On 10/02/2012 05:19 PM, Henrik Bengtsson wrote:
> Hi,
>
> I'm looking for a super-duper fast mean/sum binning implementation
> available in R, and before implementing z = binnedMeans(x y) in native
> code myself, does any one know of an existing function/package for
> this?  I'm sure it already exists.  So, given data (x,y) and B bins
> bx < bx < ... < bx[B] < bx[B+1], I'd like to calculate the
> binned means (or sums) 'z' such that z = mean(x[bx <= x & x <
> bx]), z = mean(x[bx <= x & x < bx]), .... z[B].  Let's
> assume there are no missing values and 'x' and 'bx' is already
> ordered.  The length of 'x' is in the order of 10,000-millions.  The
> number of elements in each bin vary.

You didn't say if you have a lot of bins or not. If you don't have a lot
of bins (e.g. < 10000), something like

aggregate(x, by=list(bin=findInterval(x, bx)), FUN=mean)

might not be too bad:

> x <- seq(0, 8, by=0.1)
> bx <- c(2, 2.5, 4, 5.8)
> aggregate(x, by=list(bin=findInterval(x, bx)), FUN=mean)
bin    x
1   0 0.95
2   1 2.20
3   2 3.20
4   3 4.85
5   4 6.90

I didn't try it on a 10,000-millions-elements vector though (and I've
no idea how I could do this).

H.

>
> Thanks,
>
> Henrik
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

```