[Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Sklyar, Oleg (London)
osklyar at maninvestments.com
Tue Sep 9 16:40:40 CEST 2008
Thanks for a quick reply, I was thinking of [ methods myself, but there
are so many of them. I only tested [(x=TimeDate,i=TimeDate,j=missing),
which is a completely non-standard one. It did not seem to have any
effect though.
I was thinking of writing the 'order' method and will experiment with
getting the one for xtfrm. However, it seems reasonable for the default
xtfrm to check if the object is inherited from a vector and in that case
simply returning the .Data slot? This would solve this and similar cases
immediately:
if (inherits(x,"vector")) return(as.vector(x at .Data))
BTW, generally, xtfrm.default calls 'rank' and it is not clear why rank
should work for a generic S4 object... this is essentially where the
problem is.
On a side note, a week ago I submitted a patch for the plot.default to
Rd, but nobody reacted (I checked the most recent patched and devel as
well) -- it is really an ugly bug (e.g
plot(1:5,1:5,xlim=c(-10,10),ylim=c(-8,3)) ) and the trivial patch fixes
it. Would be grateful if somebody from R-core checks it up. Meanwhile I
patch the graphics library before compiling R, which is not the best
solution. Here is the patch for src/library/graphics/plot.R
70,71c70,71
< localAxis(if(is.null(y)) xy$x else x, side = 1, ...)
< localAxis(if(is.null(y)) x else y, side = 2, ...)
---
> localAxis(xlim, side = 1, ...)
> localAxis(ylim, side = 2, ...)
Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
osklyar at maninvestments.com
> -----Original Message-----
> From: John Chambers [mailto:jmc at r-project.org]
> Sent: 09 September 2008 15:11
> To: Sklyar, Oleg (London)
> Cc: R-devel at r-project.org
> Subject: Re: [Rd] 'xtfrm' performance (influences 'order'
> performance) in R devel
>
> No definitive answers, but here are a few observations.
>
> In the call to order() code, I notice that you have dropped
> into the branch
> if (any(unlist(lapply(z, is.object))))
> where the alternative in your case would seem to have been
> going directly to the internal code.
>
> You can consider a method for xtfrm(), which would help but
> won't get you completely back to a trivial computation.
> Alternatively, order() should be eligible for the new
> mechanism of defining methods for "...".
>
> (Individual existing methods may not be the issue, and one
> can't infer anything definite from the evidence given, but a
> plausible culprit is the "[" method. Because [] expressions
> appear so often, it's always chancy to define a nontrivial
> method for this function.)
>
> John
>
> Sklyar, Oleg (London) wrote:
>
> Hello everybody,
>
> it looks like the presense of some (do know know which)
> S4 methods for a
> given S4 class degrades the performance of xtfrm (used
> in 'order' in new
> R-devel) by a factor of millions. This is for classes
> that ARE derived
> from numeric directly and thus should be quite trivial
> to convert to
> numeric.
>
> Consider the following example:
>
> setClass("TimeDateBase",
> representation("numeric", mode="character"),
> prototype(mode="posix")
> )
> setClass("TimeDate",
> representation("TimeDateBase", tzone="character"),
> prototype(tzone="London")
> )
> x = new("TimeDate", 1220966224 + runif(1e5))
>
> system.time({ z = order(x) })
> ## > system.time({ z = order(x) })
> ## user system elapsed
> ## 0.048 0.000 0.048
>
> getClass("TimeDate")
> ## Class "TimeDate"
>
> ## Slots:
>
> ## Name: .Data tzone mode
> ## Class: numeric character character
>
> ## Extends:
> ## Class "TimeDateBase", directly
> ## Class "numeric", by class "TimeDateBase", distance 2
> ## Class "vector", by class "TimeDateBase", distance 3
>
>
> Now, if I load a library that not only defines these
> same classes, but
> also a bunch of methods for those, then I have the
> following result:
>
> library(AHLCalendar)
> x = now() + runif(1e5) ## just random times in POSIXct format
> x[1:5]
> ## TimeDate [posix] object in 'Europe/London' of length 5:
> ## [1] "2008-09-09 14:19:35.218" "2008-09-09 14:19:35.672"
> ## [3] "2008-09-09 14:19:35.515" "2008-09-09 14:19:35.721"
> ## [5] "2008-09-09 14:19:35.657"
>
>
>
> system.time({ z = order(x) })
>
>
>
>
> Enter a frame number, or 0 to exit
>
> 1: system.time({
> 2: order(x)
> 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
> 4: FUN(X[[1]], ...)
> 5: xtfrm(x)
> 6: xtfrm.default(x)
> 7: as.vector(rank(x, ties.method = "min", na.last = "keep"))
> 8: rank(x, ties.method = "min", na.last = "keep")
> 9: switch(ties.method, average = , min = , max =
> .Internal(rank(x[!nas], ties.
> 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470,
> 1220966375.7211
> 11: x[j]
> 12: x[j]
>
> Selection: 0
> Timing stopped at: 47.618 13.791 66.478
>
> At the same time:
>
> system.time({ z = as.numeric(x) }) ## same as x at .Data
> ## user system elapsed
> ## 0.001 0.000 0.001
>
> The only difference between the two is that I have the
> following methods
> defined for TimeDate (full listing below).
>
> Any idea why this could be happenning. And yes, it is
> down to xtfrm
> function, 'order' was just a place where the problem
> occured. Should
> xtfrm function be smarter with respect to classes that
> are actually
> derived from 'numeric'?
>
>
>
> showMethods(class="TimeDate")
>
>
> Function: + (package base)
> e1="TimeDate", e2="TimeDate"
> e1="TimeDate", e2="numeric"
> (inherited from: e1="TimeDateBase", e2="numeric")
>
> Function: - (package base)
> e1="TimeDate", e2="TimeDate"
>
> Function: Time (package AHLCalendar)
> x="TimeDate"
>
> Function: TimeDate (package AHLCalendar)
> x="TimeDate"
>
> Function: TimeDate<- (package AHLCalendar)
> x="TimeSeries", value="TimeDate"
>
> Function: TimeSeries (package AHLCalendar)
> x="data.frame", ts="TimeDate"
> x="matrix", ts="TimeDate"
> x="numeric", ts="TimeDate"
>
> Function: [ (package base)
> x="TimeDate", i="POSIXt", j="missing"
> x="TimeDate", i="Time", j="missing"
> x="TimeDate", i="TimeDate", j="missing"
> x="TimeDate", i="integer", j="missing"
> (inherited from: x="TimeDateBase", i="ANY", j="missing")
> x="TimeDate", i="logical", j="missing"
> (inherited from: x="TimeDateBase", i="ANY", j="missing")
> x="TimeSeries", i="TimeDate", j="missing"
> x="TimeSeries", i="TimeDate", j="vector"
>
> Function: [<- (package base)
> x="TimeDate", i="ANY", j="ANY", value="ANY"
> x="TimeDate", i="ANY", j="ANY", value="numeric"
> x="TimeDate", i="missing", j="ANY", value="ANY"
> x="TimeDate", i="missing", j="ANY", value="numeric"
>
> Function: add (package AHLCalendar)
> x="TimeDate"
>
> Function: addMonths (package AHLCalendar)
> x="TimeDate"
>
> Function: addYears (package AHLCalendar)
> x="TimeDate"
>
> Function: align (package AHLCalendar)
> x="TimeDate", to="character"
> x="TimeDate", to="missing"
>
> Function: as.POSIXct (package base)
> x="TimeDate"
>
> Function: as.POSIXlt (package base)
> x="TimeDate"
>
> Function: coerce (package methods)
> from="TimeDate", to="TimeDateBase"
>
> Function: coerce<- (package methods)
> from="TimeDate", to="numeric"
>
> Function: dates (package AHLCalendar)
> x="TimeDate"
>
> Function: format (package base)
> x="TimeDate"
>
> Function: fxFwdDate (package AHLCalendar)
> x="TimeDate", country="character"
>
> Function: fxSettleDate (package AHLCalendar)
> x="TimeDate", country="character"
>
> Function: holidays (package AHLCalendar)
> x="TimeDate"
>
> Function: index (package AHLCalendar)
> x="TimeDate", y="POSIXt"
> x="TimeDate", y="Time"
> x="TimeDate", y="TimeDate"
>
> Function: initialize (package methods)
> .Object="TimeDate"
> (inherited from: .Object="ANY")
>
> Function: leapYear (package AHLCalendar)
> x="TimeDate"
>
> Function: mday (package AHLCalendar)
> x="TimeDate"
>
> Function: mode (package base)
> x="TimeDate"
> (inherited from: x="TimeDateBase")
>
> Function: mode<- (package base)
> x="TimeDate", value="character"
> (inherited from: x="TimeDateBase", value="character")
>
> Function: month (package AHLCalendar)
> x="TimeDate"
>
> Function: pretty (package base)
> x="TimeDate"
>
> Function: prettyFormat (package AHLCalendar)
> x="TimeDate", munit="character"
> x="TimeDate", munit="missing"
>
> Function: print (package base)
> x="TimeDate"
>
> Function: show (package methods)
> object="TimeDate"
> (inherited from: object="TimeDateBase")
>
> Function: summary (package base)
> object="TimeDate"
>
> Function: td2tz (package AHLCalendar)
> x="TimeDate"
>
> Function: times (package AHLCalendar)
> x="TimeDate"
>
> Function: tojulian (package AHLCalendar)
> x="TimeDate"
>
> Function: toposix (package AHLCalendar)
> x="TimeDate"
>
> Function: tots (package AHLCalendar)
> x="TimeDate"
>
> Function: tzone (package AHLCalendar)
> x="TimeDate"
>
> Function: tzone<- (package AHLCalendar)
> x="TimeDate"
>
> Function: wday (package AHLCalendar)
> x="TimeDate"
>
> Function: yday (package AHLCalendar)
> x="TimeDate"
>
> Function: year (package AHLCalendar)
> x="TimeDate"
>
>
>
> Dr Oleg Sklyar
> Research Technologist
> AHL / Man Investments Ltd
> +44 (0)20 7144 3107
> osklyar at maninvestments.com
>
>
>
> **********************************************************************
> The contents of this email are for the named
> addressee(s...{{dropped:22}}
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
**********************************************************************
The contents of this email are for the named addressee(s...{{dropped:22}}
More information about the R-devel
mailing list