[Rd] recursive data-structures in R - An S4 "node" Class

Martin Morgan mtmorgan at fhcrc.org
Sun Jan 24 18:40:06 CET 2010


Hi Russ -- some ideas below...

On 01/23/2010 06:18 PM, rt wrote:
> Hi,
> 
> In an effort to learn S4 objects, I am trying to port some  c based tree
> code to S4 object.  My immediate goal is to use .Call() interface for
> calling my c code from R. My long term goal is to understand how to write c
> structs that follows S4 classes and not the other-way-around.
> 
> The c struct for the node is :
> typedef struct node{
>   int c;
>   int n;
>   inode **nodes;  //length  = n
> } inode;
> 
> I am trying to define corresponding S4 class. The closest I could come was:
> setClassUnion("listOrNULL",members=c("list", "NULL"))
> setClass(Class = "Inode", representation(c =
> "numeric",n="numeric",nodes="listOrNULL"))
> 
> This does not work, as setting an element of the list of nodes in a node
> object overwrites the original object. I have read that there are dirty ways
> of writing code that gets around this issue. For example, using the assign()
> operator.
> I am not even sure if it will work.
> 
> I would appreciate if someone suggest ways in which this problem is best
> approached.
> Is it possible to have recursive data-structures within R ?

Maybe for your C structure I'd have started with

  setClass("INodeList", contains="list")
  setValidity("INodeList", function(object) {
      ok <- sapply(object, is, "INode")
      if (!all(ok)) "'INodeList' elements must all be 'INode'"
      else TRUE

  })

  setClass("INode",
           representation=representation(
             c="integer", n="integer", nodes="INodeList"))

and then

> tree  <- new("INode", n=2L,
             nodes=new("INodeList", list(
                new("INode"),
                new("INode", n=1L,
                    nodes=new("INodeList", list(new("INode")))))))

creates a recursive structure. One might update with

> anode <- new("INode", n=1L,
               nodes=new("INodeList", list(new("INode", c=1L, n=0L))))
> tree at nodes[[2]]@nodes[[1]] <- anode

or write a recursive procedure (e.g., to count the number of nodes)

  setGeneric("icount", function(x) standardGeneric("icount"))
  setMethod(icount, "INode", function(x) {
      res <- sapply(x at nodes, icount)
      if (length(res) == 0) 1 else sum(res) + 1
  })

> icount(tree)
[1] 5

or maybe if 'c' were meant to be the number of nodes below the current

  setGeneric("icumm", function(x) standardGeneric("icumm"))
  setMethod("icumm", "INode", function(x) {
      if (length(x at nodes) == 0) new("INode", x, c=1L)
      else {
          nodes = new("INodeList", lapply(x at nodes, icumm))
          c <- 1L + sum(sapply(nodes, slot, "c"))
          new("INode", x, c=c, nodes=nodes)
      }
  })

> t1 <- icumm(tree)
> t1 at c
[1] 5
> t1 at nodes[[1]]@c
[1] 1
> t1 at nodes[[2]]@c
[1] 3

All of this says 'yes, recursive data structures are possible within [S4
classes in] R'.

But something like icumm makes a full copy of 'tree' (probably several
more!, actually, thinking about it, probably many more since the last
new("INodes", ...) likely recopies all the nodes below the current;
rewriting this to reduce the copying [a revised algorithm, rather than
clever R tricks or jumping to C [as ?rapply does] would be an
interesting exercise for the algorithmically inclined reader), and the
result is a different object, rather than an updated 'tree'. This is the
copy-on-change semantics of R, and you'll not get around it easily (the
usual approach is to use an environment, but then your R user accustomed
to copy-on-change is surprised with action-at-a-distance, where say
tree2 <- tree and changing 'tree' unexpectedly alters the value of
'tree2'). Another interesting exercise for the reader is to use standard
OOP patterns to replace the conditional code with method dispatch, e.g.,
with a derived class NullINode rather than the R-centric classUnion.

> What is the best way of writing an interface with pre-existing c code?

'Best' is a strong word. If you're trying to capture data structures
that have well-developed C-level support, then perhaps what you want is
to return an 'ExternalPtr'; your S4 class would have a slot containing
the ExternalPtr, and any access is via methods that end up doing .Call
to a C-level manipulation on the pointer's content. You have some
responsibility to make sure that the object is managed appropriately
(e.g., with a finalizer that frees memory when the ExternalPtr is no
longer referenced in R). You also need to protect or prepare your user
for the action-at-distance problems mentioned above.

Hope that helps.

Martin

> What are the best practices for deveolping new code in c for such purposes?
> 
> Thanks,
> 
> Russ
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-devel mailing list