[Rd] S4 class extending data.frame?

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 13 16:01:35 CET 2007


Ben, Oleg --

Some solutions, which you've probably already thought of, are (a) move
the data.frame into its own slot, instead of extending it, (b) manage
the data.frame attributes yourself, or (c) reinvent the data.frame
from scratch as a proper S4 class (e.g., extending 'list' with
validity constraints on element length and homogeneity of element
content).

(b) places a lot of dependence on understanding the data.frame
implementation, and is probably too tricky (for me) to get right,(c)
is probably also tricky, and probably caries significant performance
overhead (e.g., object duplication during validity checking).

(a) means that you don't get automatic method inheritance. On the plus
side, you still get the structure. It is trivial to implement methods
like [, [[, etc to dispatch on your object and act on the appropriate
slot. And in some sense you now know what methods i.e., those you've
implemented, are supported on your object.

Oleg, here's my cautionary tale for extending list, where manually
subsetting the .Data slot mixes up the names (callNextMethod would
have done the right thing, but was not appropriate). This was quite a
subtle bug for me, because I hadn't been expecting named lists in my
object; the problem surfaced when sapply used the (incorrectly subset)
names attribute of the list. My solution in this case was to make sure
'names' were removed from lists used to construct objects. As a
consequence I lose a nice little bit of sapply magic.

> setClass('A', 'list')
[1] "A"
> setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
+     x at .Data <- x at .Data[i]
+     x
+ })
[1] "["
> names(new('A', list(x=1, y=2))[2])
[1] "x"

Martin

Oleg Sklyar <osklyar at ebi.ac.uk> writes:

> I had the same problem. Generally data.frame's behave like lists, but
> while you can extend list, there are problems extending a data.frame
> class. This comes down to the internal representation of the object I
> guess. Vectors, including list, contain their information in a (hidden)
> slot .Data (see the example below). data.frame's do not seem to follow
> this convention.
>
> Any idea how to go around?
>
> The following example is exactly the same as Ben's for a data.frame, but
> using a list. It works fine and one can see that the list structure is
> stored in .Data
>
> * ~: R
> R version 2.6.1 (2007-11-26) 
>> setClass("c3",representation(comment="character"),contains="list")
> [1] "c3"
>> l = list(1:3,2:4)
>> z3 = new("c3",l,comment="hello")
>> z3
> An object of class “c3”
> [[1]]
> [1] 1 2 3
>
> [[2]]
> [1] 2 3 4
>
> Slot "comment":
> [1] "hello"
>
>> z3 at .Data
> [[1]]
> [1] 1 2 3
>
> [[2]]
> [1] 2 3 4
>
> Regards,
> Oleg
>
> On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> I would like to build an S4 class that extends
>> a data frame, but includes several more slots.
>> 
>> Here's an example using integer as the base
>> class instead:
>> 
>> setClass("c1",representation(comment="character"),contains="integer")
>> z1 = new("c1",55,comment="hello")
>> z1
>> z1+10
>> z1[1]
>> z1 at comment
>> 
>>  -- in other words, it behaves exactly as an integer
>> for access and operations but happens to have another slot.
>> 
>>  If I do this with a data frame instead, it doesn't seem to work
>> at all.
>> 
>> setClass("c2",representation(comment="character"),contains="data.frame")
>> d = data.frame(1:3,2:4)
>> z2 = new("c2",d,comment="goodbye")
>> z2  ## data all gone!!
>> z2[,1]  ## Error ... object is not subsettable
>> z2 at comment  ## still there
>> 
>>   I can achieve approximately the same effect by
>> adding attributes, but I was hoping for the structure
>> of S4 classes ...
>> 
>>   Programming with Data and the R Language Definition
>> contain 2 references each to data frames, and neither of
>> them has allowed me to figure out this behavior.
>> 
>>  (While I'm at it: it would be wonderful to have
>> a "rich data frame" that could include as a column
>> any object that had an appropriate length and
>> [ method ... has anyone done anything in this direction?
>> ?data.frame says the allowable types are
>>  "(numeric, logical, factor and character and so on)",
>>  but I'm having trouble sorting out what the limitations
>> are ...)
>> 
>>   hoping for enlightenment (it would be lovely to be
>> shown how to make this work, but a definitive statement
>> that it is impossible would be useful too).
>> 
>>   cheers
>>     Ben Bolker
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.6 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>> 
>> iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
>> eMi+WCEWK6FPpVMpUbo+RBQ=
>> =huvz
>> -----END PGP SIGNATURE-----
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> -- 
> Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-devel mailing list