[Rd] rbind on data.frame that contains a column that is also a data.frame

William Dunlap wdunlap at tibco.com
Fri Aug 6 18:34:22 CEST 2010


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Nicholas 
> L Crookston
> Sent: Friday, August 06, 2010 8:35 AM
> To: Michael Lachmann
> Cc: r-devel-bounces at r-project.org; r-devel at r-project.org
> Subject: Re: [Rd] rbind on data.frame that contains a column 
> that is also a data.frame
> 
> OK...I'll put in my 2 cents worth. 
> 
> It seems to me that the problem is with this line:
> 
> b$a=a , where "s" is something other than a vector with
> length equal to nrow(b).
> 
> I had no idea that a dataframe could hold a dataframe. It is not just
> rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll 
> bet other R commands fail as well.
> 
> My point of view is that a dataframe is a list of vectors
> of equal length and various types (this is not exactly what the help
> page says, but it is what it suggests to me). 
> 
> Hum, I wonder how much code is based on the idea that a 
> dataframe can hold 
> a dataframe.

I used to think that non-vectors in data.frames were
pretty rare things but when I started looking into
the details of the modelling code I discovered that
matrices in data.frames are common.  E.g.,
  > library(splines)
  > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class)
  $mpg
  [1] "numeric"
  
  $`ns(hp)`
  [1] "ns"     "basis"  "matrix"
  
  $`poly(disp, 2)`
  [1] "poly"   "matrix"
You may not see these things because you don't call model.frame()
directly, but most modelling functions (e.g., lm() and glm())
do call it and use the grouping provided by the matrices to encode
how the columns of the design matrix are related to one another.

If matrices are allowed, shouldn't data.frames be allowed as well?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 15 years of using R just isn't enough! But, I can 
> say that not 
> one
> line of code I've written expects a dataframe to hold a dataframe.
> 
> > Hi,
> 
> > The following was already a topic on r-help, but after 
> understanding 
> what is
> > going on, I think it fits better in r-devel.
> 
> > The problem is this:
> > When a data.frame has another data.frame in it, rbind 
> doesn't work well.
> > Here is an example:
> > --
> > > a=data.frame(x=1:10,y=1:10)
> > > b=data.frame(z=1:10)
> > > b$a=a
> > > b
> > z a.x a.y
> > 1   1   1   1
> > 2   2   2   2
> > 3   3   3   3
> > 4   4   4   4
> > 5   5   5   5
> > 6   6   6   6
> > 7   7   7   7
> > 8   8   8   8
> > 9   9   9   9
> > 10 10  10  10
> > > rbind(b,b)
> > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", 
> "2", "3", "4", 
>  :
> > duplicate 'row.names' are not allowed
> > In addition: Warning message:
> > non-unique values when setting 'row.names': ?1?, ?10?, ?2?, 
> ?3?, ?4?, 
> ?5?,
> > ?6?, ?7?, ?8?, ?9?
> > --
> 
> > 
> > Looking at the code of rbind.data.frame, the error comes from the
> > lines:
> > --
> > xij <- xi[[j]]
> > if (has.dim[jj]) {
> > value[[jj]][ri, ] <- xij
> > rownames(value[[jj]])[ri] <- rownames(xij)   # <--  problem is here
> > }
> > --
> > if the rownames() line is dropped, all works well. What this line
> > tries to do is to join the rownames of internal elements of the
> > data.frames I try to rbind. So the result, in my case should have a
> > column 'a', whose rownames are the rownames of the original 
> column 'a'. 
> It
> > isn't totally clear to me why this is needed. When would a 
> data.frame
> > have different rownames on the inside vs. the outside?
> 
> > Notice also that rbind takes into account whether the 
> rownames of the
> > data.frames to be joined are simply 1:n, or they are something else.
> > If they are 1:n, then the result will have rownames 1:(n+m). If not,
> > then the rownames might be kept.
> 
> > I think, more consistent would be to replace the lines above with
> > something like:
> > if (has.dim[jj]) {
> > value[[jj]][ri, ] <- xij
> > rnj = rownames(value[[jj]])
> > rnj[ri] = rownames(xij)
> > rnj = make.unique(as.character(unlist(rnj)), sep = "")
> > rownames(value[[jj]]) <- rnj
> > }
> 
> > In this case, the rownames of inside elements will also be 
> joined, but
> > in case they overlap, they will be made unique - just as 
> they are for
> > the overall result of rbind. A side effect here would be that the
> > rownames of matrices will also be made unique, which till now didn't
> > happen, and which also doesn't happen when one rbinds matrices that
> > have rownames. So it would be better to test above if we are dealing
> > with a matrix or a data.frame.
> 
> > But most people don't have different rownames inside and outside.
> > Maybe it would be best to add a flag as to whether you care or don't
> > care about the rownames of internal data.frames...
> 
> > But maybe data.frames aren't meant to contain other data.frames?
> 
> > If instead I do
> > b=data.frame( z=1:10, a=a)
> > then rbind(b,b) works well. In this case the data.frame was 
> converted to 
> its
> > columns. Maybe
> > b$a = a
> > should do the same?
> 
> > Michael
> > --
> > View this message in context: http://r.789695.n4.nabble.com/rbind-
> > on-data-frame-that-contains-a-column-that-is-also-a-data-frame-
> > tp2315682p2315682.html
> > Sent from the R devel mailing list archive at Nabble.com.
> 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list