[R] Size of a refClass instance

David Kulp dkulp at fiksu.com
Sun May 5 13:45:59 CEST 2013


Yes, I agree.  How does one conceptually achieve polymorphic behavior without instantiating 100,000s of instances?  Perhaps one way around this is to represent the data in an efficient R way -- i.e. a data.frame -- and create a set of re-usable singleton instances of different node types.  To perform some polymorphic operation on a node, a singleton gets assigned to a node in the tree.  But behavior such as node$parent() or node$child(1) will require a small pool of these singletons.  Doable, I think.

PS. FWIW, I found another strike against the "massive tree of refClass instances".  It's save().  save() appears to unnecessarily expand/duplicate refClass structures.  Write time becomes prohibitive and loading in the data structure again results in a far greater memory usage.

On May 3, 2013, at 9:47 AM, Jeff Newmiller wrote:

> Interesting conclusion. Alternatively, that representation of your object model may not be computationally effective. This discrepancy may be less exaggerated in C++, but you may still find that large numbers of objects are less efficient in their use of memory or cpu time than vector processing even there. I would read the point of Martin's response as "Don't confuse your mental model of the solution with its implementation".
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> --------------------------------------------------------------------------- 
> Sent from my phone. Please excuse my brevity.
> 
> David Kulp <dkulp at fiksu.com> wrote:
> 
>> Good tip.  Thanks Morgan.
>> I agree that a different structure might (necessarily) be in order.  I
>> wanted to create a tree where nodes in a tree were of different derived
>> sub-classes -- possibly holding more data and behaving polymorphically.
>> OO programming seemed ideal for this: lots of small things with
>> specialized behavior -- but this isn't R's strength.
>> 
>> On May 2, 2013, at 4:57 PM, Martin Morgan wrote:
>> 
>>> On 05/01/2013 11:20 AM, David Kulp wrote:
>>>> I'm using refClass for a complex multi-directional tree structure
>> with
>>>> possibly 100,000s of nodes.  The refClass design is very impressive
>> and I'd
>>>> love to use it, but I've found that the size of refClass instances
>> are very
>>>> large and creation time is slow.  For example, below is a RefClass
>> and normal
>>>> S4 class.  The RefClass requires about 4KB per instance vs 500B for
>> the S4
>>>> class -- based on adding the Ncells and Vcells of used memory
>> reported by
>>>> gc().  And instantiation is more than twice as slow for a RefClass. 
>> (R
>>>> 2.14.2)
>>>> 
>>>> Anyone have thoughts on this and whether there's any hope for
>> improving
>>>> resources on either front?
>>> 
>>> Hi David -- not necessarily helpful but creating a few large objects
>> is always better than creating many small in R, so perhaps
>> re-conceptualize your data structure? As a rough analogy, instead of
>> constructing a graph as a large number of 'Node' instances each
>> pointing to one another, a graph could be represented as a data.frame
>> containing columns of 'from' and 'to' indexes (neighbour-edge list, a
>> few large objects) or as an adjacency matrix. One would also implement
>> creation and update of the few large objects in an R-friendly
>> (vectorized) way.
>>> 
>>> Perhaps there are existing packages that already model the data
>> you're interested in? If your multi-directional tree can be represented
>> as a graph, then perhaps
>>> 
>>> http://bioconductor.org/packages/release/bioc/html/graph.html
>>> 
>>> including facilities in the Boost graph library (RBGL, on the
>> Bioconductor web site, too) or the igraph package can be put to use.
>>> 
>>> Martin
>>> 
>>>> 
>>>> I wonder what others are doing.  I've been thinking about
>> lightweight
>>>> alternative implementations, but nothing particularly elegant has
>> come to
>>>> mind, yet!
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> simple <- setRefClass('simple', fields = list(a = "character",
>> b="numeric")
>>>> ) gc() system.time(simple.list <- lapply(1:100000, function(i) {
>>>> simple$new(a='foo',b=i) })) gc()
>>>> 
>>>> setClass('simple2', representation(a="character",b="numeric"))
>>>> setMethod("initialize", "simple2", function(.Object, a, b) {
>> .Object at a <- a
>>>> .Object at b <- b .Object })
>>>> 
>>>> gc() system.time(simple2.list <- lapply(1:100000, function(i) {
>>>> new('simple2',a='foo',b=i) })) gc()
>>>> 
>>>> ______________________________________________ R-help at r-project.org
>> mailing
>>>> list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
>> posting
>>>> guide http://www.R-project.org/posting-guide.html and provide
>> commented,
>>>> minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> -- 
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>> 
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list