[Rd] rbind on data.frame that contains a column that is also a data.frame

Heinz Tuechler tuechler at gmx.at
Sat Aug 7 02:01:24 CEST 2010


Also Surv objects are matrices and they share the same problem when 
rbind-ing data.frames.
If contained in a data.frame, Surv objects loose their class after 
rbind and therefore do not more represent Surv objects afterwards.
Using rbind with Surv objects outside of data.frames shows a similar 
problem, but not the same column names.
In conclusion, yes, matrices are common in data.frames, but not 
without problems.

Heinz

## example
library(survival)
## create example data
starttime <- rep(0,5)
stoptime  <- 1:5
event     <- c(1,0,1,1,1)
group     <- c(1,1,1,2,2)

## build Surv object
survobj <- Surv(starttime, stoptime, event)

## build data.frame with Surv object
df.test <- data.frame(survobj, group)
df.test

## rbind data.frames
rbind(df.test, df.test)

## rbind Surv objects
rbind(survobj, survobj)



At 06.08.2010 09:34 -0700, William Dunlap wrote:
> > -----Original Message-----
> > From: r-devel-bounces at r-project.org
> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Nicholas
> > L Crookston
> > Sent: Friday, August 06, 2010 8:35 AM
> > To: Michael Lachmann
> > Cc: r-devel-bounces at r-project.org; r-devel at r-project.org
> > Subject: Re: [Rd] rbind on data.frame that contains a column
> > that is also a data.frame
> >
> > OK...I'll put in my 2 cents worth.
> >
> > It seems to me that the problem is with this line:
> >
> > b$a=a , where "s" is something other than a vector with
> > length equal to nrow(b).
> >
> > I had no idea that a dataframe could hold a dataframe. It is not just
> > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll
> > bet other R commands fail as well.
> >
> > My point of view is that a dataframe is a list of vectors
> > of equal length and various types (this is not exactly what the help
> > page says, but it is what it suggests to me).
> >
> > Hum, I wonder how much code is based on the idea that a
> > dataframe can hold
> > a dataframe.
>
>I used to think that non-vectors in data.frames were
>pretty rare things but when I started looking into
>the details of the modelling code I discovered that
>matrices in data.frames are common.  E.g.,
>   > library(splines)
>   > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class)
>   $mpg
>   [1] "numeric"
>
>   $`ns(hp)`
>   [1] "ns"     "basis"  "matrix"
>
>   $`poly(disp, 2)`
>   [1] "poly"   "matrix"
>You may not see these things because you don't call model.frame()
>directly, but most modelling functions (e.g., lm() and glm())
>do call it and use the grouping provided by the matrices to encode
>how the columns of the design matrix are related to one another.
>
>If matrices are allowed, shouldn't data.frames be allowed as well?
>
>Bill Dunlap
>Spotfire, TIBCO Software
>wdunlap tibco.com
>
> > 15 years of using R just isn't enough! But, I can
> > say that not
> > one
> > line of code I've written expects a dataframe to hold a dataframe.
> >
> > > Hi,
> >
> > > The following was already a topic on r-help, but after
> > understanding
> > what is
> > > going on, I think it fits better in r-devel.
> >
> > > The problem is this:
> > > When a data.frame has another data.frame in it, rbind
> > doesn't work well.
> > > Here is an example:
> > > --
> > > > a=data.frame(x=1:10,y=1:10)
> > > > b=data.frame(z=1:10)
> > > > b$a=a
> > > > b
> > > z a.x a.y
> > > 1   1   1   1
> > > 2   2   2   2
> > > 3   3   3   3
> > > 4   4   4   4
> > > 5   5   5   5
> > > 6   6   6   6
> > > 7   7   7   7
> > > 8   8   8   8
> > > 9   9   9   9
> > > 10 10  10  10
> > > > rbind(b,b)
> > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1",
> > "2", "3", "4",
> >  :
> > > duplicate 'row.names' are not allowed
> > > In addition: Warning message:
> > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?,
> > ?3?, ?4?,
> > ?5?,
> > > ?6?, ?7?, ?8?, ?9?
> > > --
> >
> > >
> > > Looking at the code of rbind.data.frame, the error comes from the
> > > lines:
> > > --
> > > xij <- xi[[j]]
> > > if (has.dim[jj]) {
> > > value[[jj]][ri, ] <- xij
> > > rownames(value[[jj]])[ri] <- rownames(xij)   # <--  problem is here
> > > }
> > > --
> > > if the rownames() line is dropped, all works well. What this line
> > > tries to do is to join the rownames of internal elements of the
> > > data.frames I try to rbind. So the result, in my case should have a
> > > column 'a', whose rownames are the rownames of the original
> > column 'a'.
> > It
> > > isn't totally clear to me why this is needed. When would a
> > data.frame
> > > have different rownames on the inside vs. the outside?
> >
> > > Notice also that rbind takes into account whether the
> > rownames of the
> > > data.frames to be joined are simply 1:n, or they are something else.
> > > If they are 1:n, then the result will have rownames 1:(n+m). If not,
> > > then the rownames might be kept.
> >
> > > I think, more consistent would be to replace the lines above with
> > > something like:
> > > if (has.dim[jj]) {
> > > value[[jj]][ri, ] <- xij
> > > rnj = rownames(value[[jj]])
> > > rnj[ri] = rownames(xij)
> > > rnj = make.unique(as.character(unlist(rnj)), sep = "")
> > > rownames(value[[jj]]) <- rnj
> > > }
> >
> > > In this case, the rownames of inside elements will also be
> > joined, but
> > > in case they overlap, they will be made unique - just as
> > they are for
> > > the overall result of rbind. A side effect here would be that the
> > > rownames of matrices will also be made unique, which till now didn't
> > > happen, and which also doesn't happen when one rbinds matrices that
> > > have rownames. So it would be better to test above if we are dealing
> > > with a matrix or a data.frame.
> >
> > > But most people don't have different rownames inside and outside.
> > > Maybe it would be best to add a flag as to whether you care or don't
> > > care about the rownames of internal data.frames...
> >
> > > But maybe data.frames aren't meant to contain other data.frames?
> >
> > > If instead I do
> > > b=data.frame( z=1:10, a=a)
> > > then rbind(b,b) works well. In this case the data.frame was
> > converted to
> > its
> > > columns. Maybe
> > > b$a = a
> > > should do the same?
> >
> > > Michael
> > > --
> > > View this message in context: http://r.789695.n4.nabble.com/rbind-
> > > on-data-frame-that-contains-a-column-that-is-also-a-data-frame-
> > > tp2315682p2315682.html
> > > Sent from the R devel mailing list archive at Nabble.com.
> >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list