[R] fast rowCumsums wanted for calculating the cdf

Gregor mailinglist at gmx.at
Fri Oct 15 08:51:26 CEST 2010


Dear all,

Maybe the "easiest" solution: Is there anything that speaks against generalizing
cumsum from base to cope with matrices (as is done in matlab)? E.g.:

"cumsum(Matrix, 1)"
equivalent to
"apply(Matrix, 1, cumsum)"

The main advantage could be optimized code if the Matrix is extreme nonsquare
(e.g. 100,000x10), but the summation is done over the short side (in this case 10).
apply would practically yield a loop over 100,000 elements, and vectorization w.r.t.
the long side (loop over 10 elements) provides considerable efficiency gains.

Many regards,
Gregor




On Tue, 12 Oct 2010 10:24:53 +0200
Gregor <mailinglist at gmx.at> wrote:

> Dear all,
> 
> I am struggling with a (currently) cost-intensive problem: calculating the
> (non-normalized) cumulative distribution function, given the (non-normalized)
> probabilities. something like:
> 
> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
> 
> One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson:
> 
> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> F[,1] <- probs[,1,drop=TRUE];
> for (cc in 2:ncol(F)) {
>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> }
> 
> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around
> 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but
> in order to extend matrixStats, this could be a nontrivial issue.
> 
> Any ideas for speeding up this - probably routine - task?
> 
> Thanks in advance,
> Gregor
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list