[Rd] rbind on data.frame that contains a column that is also a data.frame

Martin Maechler maechler at stat.math.ethz.ch
Sat Aug 7 13:03:06 CEST 2010


>>>>> Heinz Tuechler <tuechler at gmx.at>
>>>>>     on Sat, 07 Aug 2010 01:01:24 +0100 writes:

    > Also Surv objects are matrices and they share the same problem when 
    > rbind-ing data.frames.
    > If contained in a data.frame, Surv objects loose their class after 
    > rbind and therefore do not more represent Surv objects afterwards.
    > Using rbind with Surv objects outside of data.frames shows a similar 
    > problem, but not the same column names.
    > In conclusion, yes, matrices are common in data.frames, but not 
    > without problems.

My understanding (> 20 yr long S and R experience) has been that
a dataframe definitely can have matrix-like "components",
and as Bill Dunlap (with equal S & R experience) has just
explained, that's actually more common than you have thought.
To have *data frame*s instead of simple matrices, should be much
less common, I'm not sure if it's a good idea.

But getting back to 'matrices',
I think they should work "without problems", at least for basic
R operations such as rbind().

I don't have time to analyze the Surv - example below,
but  at the moment think, that we'd be interested in 
"fixing" the problems..

Martin Maechler, ETH Zurich

    > Heinz

    > ## example
    > library(survival)
    > ## create example data
    > starttime <- rep(0,5)
    > stoptime  <- 1:5
    > event     <- c(1,0,1,1,1)
    > group     <- c(1,1,1,2,2)

    > ## build Surv object
    > survobj <- Surv(starttime, stoptime, event)

    > ## build data.frame with Surv object
    > df.test <- data.frame(survobj, group)
    > df.test

    > ## rbind data.frames
    > rbind(df.test, df.test)

    > ## rbind Surv objects
    > rbind(survobj, survobj)



    > At 06.08.2010 09:34 -0700, William Dunlap wrote:
    >> > -----Original Message-----
    >> > From: r-devel-bounces at r-project.org
    >> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Nicholas
    >> > L Crookston
    >> > Sent: Friday, August 06, 2010 8:35 AM
    >> > To: Michael Lachmann
    >> > Cc: r-devel-bounces at r-project.org; r-devel at r-project.org
    >> > Subject: Re: [Rd] rbind on data.frame that contains a column
    >> > that is also a data.frame
    >> >
    >> > OK...I'll put in my 2 cents worth.
    >> >
    >> > It seems to me that the problem is with this line:
    >> >
    >> > b$a=a , where "s" is something other than a vector with
    >> > length equal to nrow(b).
    >> >
    >> > I had no idea that a dataframe could hold a dataframe. It is not just
    >> > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll
    >> > bet other R commands fail as well.
    >> >
    >> > My point of view is that a dataframe is a list of vectors
    >> > of equal length and various types (this is not exactly what the help
    >> > page says, but it is what it suggests to me).
    >> >
    >> > Hum, I wonder how much code is based on the idea that a
    >> > dataframe can hold
    >> > a dataframe.
    >> 
    >> I used to think that non-vectors in data.frames were
    >> pretty rare things but when I started looking into
    >> the details of the modelling code I discovered that
    >> matrices in data.frames are common.  E.g.,
    >> > library(splines)
    >> > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class)
    >> $mpg
    >> [1] "numeric"
    >> 
    >> $`ns(hp)`
    >> [1] "ns"     "basis"  "matrix"
    >> 
    >> $`poly(disp, 2)`
    >> [1] "poly"   "matrix"
    >> You may not see these things because you don't call model.frame()
    >> directly, but most modelling functions (e.g., lm() and glm())
    >> do call it and use the grouping provided by the matrices to encode
    >> how the columns of the design matrix are related to one another.
    >> 
    >> If matrices are allowed, shouldn't data.frames be allowed as well?
    >> 
    >> Bill Dunlap
    >> Spotfire, TIBCO Software
    >> wdunlap tibco.com
    >> 
    >> > 15 years of using R just isn't enough! But, I can
    >> > say that not
    >> > one
    >> > line of code I've written expects a dataframe to hold a dataframe.
    >> >
    >> > > Hi,
    >> >
    >> > > The following was already a topic on r-help, but after
    >> > understanding
    >> > what is
    >> > > going on, I think it fits better in r-devel.
    >> >
    >> > > The problem is this:
    >> > > When a data.frame has another data.frame in it, rbind
    >> > doesn't work well.
    >> > > Here is an example:
    >> > > --
    >> > > > a=data.frame(x=1:10,y=1:10)
    >> > > > b=data.frame(z=1:10)
    >> > > > b$a=a
    >> > > > b
    >> > > z a.x a.y
    >> > > 1   1   1   1
    >> > > 2   2   2   2
    >> > > 3   3   3   3
    >> > > 4   4   4   4
    >> > > 5   5   5   5
    >> > > 6   6   6   6
    >> > > 7   7   7   7
    >> > > 8   8   8   8
    >> > > 9   9   9   9
    >> > > 10 10  10  10
    >> > > > rbind(b,b)
    >> > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1",
    >> > "2", "3", "4",
    >> >  :
    >> > > duplicate 'row.names' are not allowed
    >> > > In addition: Warning message:
    >> > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?,
    >> > ?3?, ?4?,
    >> > ?5?,
    >> > > ?6?, ?7?, ?8?, ?9?
    >> > > --
    >> >
    >> > >
    >> > > Looking at the code of rbind.data.frame, the error comes from the
    >> > > lines:
    >> > > --
    >> > > xij <- xi[[j]]
    >> > > if (has.dim[jj]) {
    >> > > value[[jj]][ri, ] <- xij
    >> > > rownames(value[[jj]])[ri] <- rownames(xij)   # <--  problem is here
    >> > > }
    >> > > --
    >> > > if the rownames() line is dropped, all works well. What this line
    >> > > tries to do is to join the rownames of internal elements of the
    >> > > data.frames I try to rbind. So the result, in my case should have a
    >> > > column 'a', whose rownames are the rownames of the original
    >> > column 'a'.
    >> > It
    >> > > isn't totally clear to me why this is needed. When would a
    >> > data.frame
    >> > > have different rownames on the inside vs. the outside?
    >> >
    >> > > Notice also that rbind takes into account whether the
    >> > rownames of the
    >> > > data.frames to be joined are simply 1:n, or they are something else.
    >> > > If they are 1:n, then the result will have rownames 1:(n+m). If not,
    >> > > then the rownames might be kept.
    >> >
    >> > > I think, more consistent would be to replace the lines above with
    >> > > something like:
    >> > > if (has.dim[jj]) {
    >> > > value[[jj]][ri, ] <- xij
    >> > > rnj = rownames(value[[jj]])
    >> > > rnj[ri] = rownames(xij)
    >> > > rnj = make.unique(as.character(unlist(rnj)), sep = "")
    >> > > rownames(value[[jj]]) <- rnj
    >> > > }
    >> >
    >> > > In this case, the rownames of inside elements will also be
    >> > joined, but
    >> > > in case they overlap, they will be made unique - just as
    >> > they are for
    >> > > the overall result of rbind. A side effect here would be that the
    >> > > rownames of matrices will also be made unique, which till now didn't
    >> > > happen, and which also doesn't happen when one rbinds matrices that
    >> > > have rownames. So it would be better to test above if we are dealing
    >> > > with a matrix or a data.frame.
    >> >
    >> > > But most people don't have different rownames inside and outside.
    >> > > Maybe it would be best to add a flag as to whether you care or don't
    >> > > care about the rownames of internal data.frames...
    >> >
    >> > > But maybe data.frames aren't meant to contain other data.frames?
    >> >
    >> > > If instead I do
    >> > > b=data.frame( z=1:10, a=a)
    >> > > then rbind(b,b) works well. In this case the data.frame was
    >> > converted to
    >> > its
    >> > > columns. Maybe
    >> > > b$a = a
    >> > > should do the same?
    >> >
    >> > > Michael
    >> > > --
    >> > > View this message in context: http://r.789695.n4.nabble.com/rbind-
    >> > > on-data-frame-that-contains-a-column-that-is-also-a-data-frame-
    >> > > tp2315682p2315682.html
    >> > > Sent from the R devel mailing list archive at Nabble.com.
    >> >
    >> > > ______________________________________________
    >> > > R-devel at r-project.org mailing list
    >> > > https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >       [[alternative HTML version deleted]]
    >> >
    >> > ______________________________________________
    >> > R-devel at r-project.org mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list