[R] Alternative to apply in base R

William Dunlap wdunlap at tibco.com
Tue Nov 8 23:14:15 CET 2016


The version which allows any number of columns does not take
much more time than the one that requires exactly 7 columns.
If you have a zillion columns then these are not so good.

> f1 <- function(x) x[,1]*x[,2]*x[,3]*x[,4]*x[,5]*x[,6]*x[,7]
> f2 <- function(x) {
+    val <- rep(1, nrow(x))
+    for(i in seq_len(ncol(x))) {
+       val <- val * x[,i]
+    }
+    val
+ }
> z <- matrix(runif(10e6 * 7), ncol=7)
> system.time(v1 <- f1(z))
   user  system elapsed
  0.686   0.140   0.826
> system.time(v2 <- f2(z))
   user  system elapsed
  0.663   0.196   0.860
> all.equal(v1,v2,tolerance=0)
[1] TRUE

You might speed up f2 a tad by special-casing the ncol==0,
ncol==1, and ncol>1 cases.

The versions that call prod() nrow(x) times take about 25 seconds
on this machine and dataset.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Nov 8, 2016 at 1:58 PM, Doran, Harold <HDoran at air.org> wrote:

> Well, I wish R-help had a “like” button as I would most certainly like
> this reply :)
>
> As usual, you’re right. I should have added a disclaimer that “in this
> instance” there are 7 columns as the function I wrote evaluates an
> N-dimensional integral and so as the dimensions change, so do the number
> of columns in this matrix (plus another factor). But the number of columns
> is never all that large.
>
>
>
> On 11/8/16, 4:37 PM, "peter dalgaard" <pdalgd at gmail.com> wrote:
>
> >
> >> On 08 Nov 2016, at 21:23 , Doran, Harold <HDoran at air.org> wrote:
> >>
> >> It¹s a good suggestion. Multiplication in this case is over 7 columns in
> >> the data, but the number of rows is millions. Unfortunately, the values
> >> are negative as these are actually gauss-quad nodes used to evaluate a
> >> multidimensional integral.
> >
> >If there really are only 7 cols, then there's also the blindingly obvious
> >
> >mm[,1]*mm[,2]*mm[,3]*mm[,4]*mm[,5]*mm[,6]*mm[,7]
> >
> >-pd
> >
> >
> >>
> >> colSums is better than something like apply(dat, 2, sum); I was hoping
> >> there was something similar to colSums/rowSums using prod().
> >>
> >> On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca> wrote:
> >>
> >>> Dear Harold,
> >>>
> >>> If the actual data with which you're dealing are non-negative, you
> >>>could
> >>> log all the values, and use colSums() on the logs. That might also have
> >>> the advantage of greater numerical accuracy than multiplying millions
> >>>of
> >>> numbers. Depending on the numbers, the products may be too large or
> >>>small
> >>> to be represented. Of course, logs won't work with your toy example,
> >>> where rnorm() will generate values that are both negative and positive.
> >>>
> >>> I hope this helps,
> >>> John
> >>> -----------------------------
> >>> John Fox, Professor
> >>> McMaster University
> >>> Hamilton, Ontario
> >>> Canada L8S 4M4
> >>> web: socserv.mcmaster.ca/jfox
> >>>
> >>>
> >>> ________________________________________
> >>> From: R-help [r-help-bounces at r-project.org] on behalf of Doran, Harold
> >>> [HDoran at air.org]
> >>> Sent: November 8, 2016 10:57 AM
> >>> To: r-help at r-project.org
> >>> Subject: [R] Alternative to apply in base R
> >>>
> >>> Without reaching out to another package in R, I wonder what the best
> >>>way
> >>> is to speed enhance the following toy example? Over the years I have
> >>> become very comfortable with the family of apply functions and
> >>>generally
> >>> not good at finding an improvement for speed.
> >>>
> >>> This toy example is small, but my real data has many millions of rows
> >>>and
> >>> the same operations is repeated many times and so finding a less
> >>> expensive alternative would be helpful.
> >>>
> >>> mm <- matrix(rnorm(100), ncol = 10)
> >>> rn <- apply(mm, 1, prod)
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >--
> >Peter Dalgaard, Professor,
> >Center for Statistics, Copenhagen Business School
> >Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >Phone: (+45)38153501
> >Office: A 4.23
> >Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list