[R] tapply huge speed difference if X has names
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Aug 8 21:36:17 CEST 2005
Please use a current version of R!
This was fixed long ago, and you will find it in the NEWS file:
split() now handles vectors with names internally and so is
almost as fast as on vectors without names (and maybe 100x
faster than before).
On Mon, 8 Aug 2005, Matthew Dowle wrote:
>
> Hi all,
>
> Apologies if this has been raised before ... R's tapply is very fast, but if
> X has names in this example, there seems to be a huge slow down: under 1
> second compared to 151 seconds. The following timings are repeatable and
> are timed properly on a single user machine :
>
>> X = 1:100000
>> names(X) = X
>> system.time(fast<<-tapply(as.vector(X), rep(1:10000,each=10), mean)) #
> as.vector() to drop the names
> [1] 0.36 0.00 0.35 0.00 0.00
>> system.time(slow<<-tapply(X, rep(1:10000,each=10), mean))
> [1] 149.95 1.83 151.79 0.00 0.00
>> head(fast)
> 1 2 3 4 5 6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> head(slow)
> 1 2 3 4 5 6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> identical(fast,slow)
> [1] TRUE
>>
>
> Looking inside tapply, which then calls split, it seems there is an
> is.null(names(x)) which prevents R's internal fast version from being
> called. Why is that there? Could it be removed? I often do something like
> tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames
> of mat become the names of the vector mat[,"colname"], and this seems to
> slow down tapply a lot. Perhaps other functions which call split also suffer
> this problem?
>
>> split.default
> function (x, f)
> {
> if (is.list(f))
> f <- interaction(f)
> f <- factor(f)
> if (is.null(attr(x, "class")) && is.null(names(x)))
> return(.Internal(split(x, f)))
> lf <- levels(f)
> y <- vector("list", length(lf))
> names(y) <- lf
> for (k in lf) y[[k]] <- x[f %in% k]
> y
> }
> <environment: namespace:base>
>>
>
>> version
> _
> platform x86_64-redhat-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major 2
> minor 0.1
> year 2004
> month 11
> day 15
> language R
>>
>
>
> Thanks and regards,
> Matthew
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list