[R] printing a data.frame that contains a list-column of S4 objects

Martin Maechler maechler at stat.math.ethz.ch
Thu Jan 14 09:34:57 CET 2016


>>>>> boB Rudis <bob at rudis.net>
>>>>>     on Tue, 12 Jan 2016 13:51:50 -0500 writes:

    > I wonder if something like:
    > format.list <- function(x, ...) {
    > rep(class(x[[1]]), length(x))
    > }

    > would be sufficient? (prbly needs more 'if's though)

Dear Jenny,
for a different perspective (and a lot of musings), see inline below

    > On Tue, Jan 12, 2016 at 12:15 PM, Jenny Bryan <jenny at stat.ubc.ca> wrote:
    >> Is there a general problem with printing a data.frame when it has a
    >> list-column of S4 objects? Or am I just unlucky in my life choices?
    >> 
    >> I ran across this with objects from the git2r package but maintainer
    >> Stefan Widgren points out this example below from Matrix as well. I note
    >> that the offending object can be printed if sent through
    >> dplyr::tbl_df(). I accept that that printing doesn't provide much info
    >> on S4 objects. I'd just like those vars to not prevent data.frame-style
    >> inpsection of the entire object.
    >> 
    >> I asked this on stack overflow, where commenter provided the lead to the
    >> workaround below. Is that the best solution?
    >> 
    >> library(Matrix)
    >> 
    >> m <- new("dgCMatrix")
    >> isS4(m)
    >> #> [1] TRUE
    >> df <- data.frame(id = 1:2)
    >> df$matrices <- list(m, m)

This only works by accident (I think), and fails for

  df <- data.frame(id = 1)
  df$matrices <- list(m, m)

    > df <- data.frame(id = 1)
    > df$matrices <- list(m, m)
    Error in `$<-.data.frame`(`*tmp*`, "matrices", value = list(<S4 object of class "dgCMatrix">,  : 
    replacement has 2 rows, data has 1
    > 


    >> df
    >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic
    >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic

Hmm,
As 'data.frame' is just an S3 class there is no formal
definition to go with and in this sense you are of course entitled
to all expectations. ;-)
Even though data frames are internally coded as lists, I
strongly believe data frames should be taught as (and thought of)
	 "generalized matrices"
in the sense that data frames should be thought of n (say) rows
and p (say) columns.

The help pages  for  data.frame()  and as.data.frame()
should make it clear that you can *not* put all kinds of entries
into data frame columns, but I agree the documentation is vague
and probably has to remain vague,
because if you provide  as.data.frame()  methods for your class
you should be able to go quite far.

In addition, the data frame columns need to fulfill properties, e.g.,
subsetting (aka "indexing") and also subassignment ( df[i,j] <- v )

Now the real "problem" here is that the '$<-' and '[<-'  methods
for data frames which you call via  df$m <- v  or  df[,co] <- V
are too "forgiving". They only check that NROW(.) of the new
entry corresponds to the nrow(<data.frame>).
Currently they allow very easy construction of illegal data
frames(*), as in your present case.

--
*) Yes, it is hard to say when a data.frame is illegal, as there
   is no formal definition

There is more to be said and thought about if you really want
sparse matrices in a data frame, and as 'Matrix' maintainers,
I'm quite interested *why* you'd want that, but I won't go there
now.

One last issue though: The idea of allowing to put 'matrix' or
'array' into data frames is that each column of the matrix
becomes a separate column of the data frame

> data.frame(D = diag(3), M = matrix(1:12, 3,4))
  D.1 D.2 D.3 M.1 M.2 M.3 M.4
1   1   0   0   1   4   7  10
2   0   1   0   2   5   8  11
3   0   0   1   3   6   9  12

.... and that would be quite inefficient for large sparse matrices.

---------

Final recommendation as a summary:

If  data.frame(.., .., ..) does not work to put entries into a
data frame, then don't do it, but rather think about how to make
data.frame() work with your objects -- namely by ensuring that
as.data.frame() works .. possibly by providing an
as.data.frame() method.

Best regards,
Martin Maechler



More information about the R-help mailing list