[R] Very slow using S4 classes

Martin Maechler maechler at stat.math.ethz.ch
Sat Sep 10 19:49:24 CEST 2011


>>>>> Martin Morgan <mtmorgan at fhcrc.org>
>>>>>     on Sat, 10 Sep 2011 10:18:11 -0700 writes:

    > On 09/10/2011 08:08 AM, André Rossi wrote:
    >> Hi everybody!
    >> 
    >> I'm creating an object of a S4 class that has two slots: ListExamples,
    >> which is a list, and idx, which is an integer (as the code below).
    >> 
    >> Then, I read a data.frame file with 10000 (ten thousands) of lines and
    >> 10 columns, do some pre-processing and, basically, I store each line
    >> as an element of a list in the slot ListExamples of the S4
    >> object. However, many operations after this take a considerable time.
    >> 
    >> Can anyone explain me why dois it happen? Is it possible to speed up
    >> an script that deals with a big number of data (it might be data.frame
    >> or list)?
    >> 
    >> Thank you,
    >> 
    >> André Rossi
    >> 
    >> setClass("Buffer", representation=representation( Listexamples =
    >> "list", idx = "integer" ) )

    > Hi André,

    > Can you provide a simpler and more reproducible example, for instance

    >> setClass("Buf", representation=representation(lst="list"))
    > [1] "Buf"
    >> b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
    >> system.time({ b at lst[[1]][[1]] = 2 })
    > user  system elapsed
    > 0.005   0.000   0.005

    > Generally it sounds like you're modeling the rows as elements of
    > Listofelements, but you're better served by modeling the columns (lst =
    > replicate(10, integer(10000)), if all of your 10 columns were
    > integer-valued, for instance). Also, S4 is providing some measure of
    > type safety, and you're undermining that by having your class contain a
    > 'list'. I'd go after

    > setClass("Buffer",
    > representation=representation(
    > col1="integer",
    > col2="character",
    > col3="numeric"
    > ## etc.
    > ),
    > validity=function(object) {
    > nms <- slotNames(object)
    > len <- sapply(nms, function(nm) length(slot(object, nm)))
    > if (1L != length(unique(len)))
    > "slots must all be of same length"
    > else TRUE
    > })

    > Buffer <-
    > function(col1, col2, col3, ...)
    > {
    > new("Buffer", col1=col1, col2=col2, col3=col3, ...)
    > }

    > Let's see where the inefficiencies are before deciding that this is an 
    > S4 issue.

    > Martin

Yes, indeed!  

--
another Martin



More information about the R-help mailing list