[R] Very slow using S4 classes
Martin Maechler
maechler at stat.math.ethz.ch
Sat Sep 10 19:49:24 CEST 2011
>>>>> Martin Morgan <mtmorgan at fhcrc.org>
>>>>> on Sat, 10 Sep 2011 10:18:11 -0700 writes:
> On 09/10/2011 08:08 AM, André Rossi wrote:
>> Hi everybody!
>>
>> I'm creating an object of a S4 class that has two slots: ListExamples,
>> which is a list, and idx, which is an integer (as the code below).
>>
>> Then, I read a data.frame file with 10000 (ten thousands) of lines and
>> 10 columns, do some pre-processing and, basically, I store each line
>> as an element of a list in the slot ListExamples of the S4
>> object. However, many operations after this take a considerable time.
>>
>> Can anyone explain me why dois it happen? Is it possible to speed up
>> an script that deals with a big number of data (it might be data.frame
>> or list)?
>>
>> Thank you,
>>
>> André Rossi
>>
>> setClass("Buffer", representation=representation( Listexamples =
>> "list", idx = "integer" ) )
> Hi André,
> Can you provide a simpler and more reproducible example, for instance
>> setClass("Buf", representation=representation(lst="list"))
> [1] "Buf"
>> b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
>> system.time({ b at lst[[1]][[1]] = 2 })
> user system elapsed
> 0.005 0.000 0.005
> Generally it sounds like you're modeling the rows as elements of
> Listofelements, but you're better served by modeling the columns (lst =
> replicate(10, integer(10000)), if all of your 10 columns were
> integer-valued, for instance). Also, S4 is providing some measure of
> type safety, and you're undermining that by having your class contain a
> 'list'. I'd go after
> setClass("Buffer",
> representation=representation(
> col1="integer",
> col2="character",
> col3="numeric"
> ## etc.
> ),
> validity=function(object) {
> nms <- slotNames(object)
> len <- sapply(nms, function(nm) length(slot(object, nm)))
> if (1L != length(unique(len)))
> "slots must all be of same length"
> else TRUE
> })
> Buffer <-
> function(col1, col2, col3, ...)
> {
> new("Buffer", col1=col1, col2=col2, col3=col3, ...)
> }
> Let's see where the inefficiencies are before deciding that this is an
> S4 issue.
> Martin
Yes, indeed!
--
another Martin
More information about the R-help
mailing list