[Rd] rbind on data.frame that contains a column that is also a data.frame

Claudia Beleites cbeleites at units.it
Mon Aug 9 14:20:39 CEST 2010


Dear all,

I also use matrices inside data.frames in my (S4) class "hyperSpec".
So, yes, it would be great if commands like apply would work as straightforward
as e.g. rbind does on such data.frames.

The problem here seems to me different, though:
Is it possible that the relevant difference between matrices and data.frames is
that the rownames of a matrix do not need to be unique?

m <- matrix (1:12, ncol = 3)
rownames (m) <- rep (1, 4)
m

df <- as.data.frame (m) # seems to work, but:
df # throws error:
Fehler in data.frame(V1 = c("1", "2", "3", "4"), V2 = c("5", "6", "7",  :
   duplicate row.names: 1
dimnames (df)

Greetings,

Claudia


Martin Maechler wrote:
>>>>>> Heinz Tuechler <tuechler at gmx.at>
>>>>>>     on Sat, 07 Aug 2010 01:01:24 +0100 writes:
> 
>     > Also Surv objects are matrices and they share the same problem when 
>     > rbind-ing data.frames.
>     > If contained in a data.frame, Surv objects loose their class after 
>     > rbind and therefore do not more represent Surv objects afterwards.
>     > Using rbind with Surv objects outside of data.frames shows a similar 
>     > problem, but not the same column names.
>     > In conclusion, yes, matrices are common in data.frames, but not 
>     > without problems.
> 
> My understanding (> 20 yr long S and R experience) has been that
> a dataframe definitely can have matrix-like "components",
> and as Bill Dunlap (with equal S & R experience) has just
> explained, that's actually more common than you have thought.
> To have *data frame*s instead of simple matrices, should be much
> less common, I'm not sure if it's a good idea.
> 
> But getting back to 'matrices',
> I think they should work "without problems", at least for basic
> R operations such as rbind().
> 
> I don't have time to analyze the Surv - example below,
> but  at the moment think, that we'd be interested in 
> "fixing" the problems..
> 
> Martin Maechler, ETH Zurich
> 
>     > Heinz
> 
>     > ## example
>     > library(survival)
>     > ## create example data
>     > starttime <- rep(0,5)
>     > stoptime  <- 1:5
>     > event     <- c(1,0,1,1,1)
>     > group     <- c(1,1,1,2,2)
> 
>     > ## build Surv object
>     > survobj <- Surv(starttime, stoptime, event)
> 
>     > ## build data.frame with Surv object
>     > df.test <- data.frame(survobj, group)
>     > df.test
> 
>     > ## rbind data.frames
>     > rbind(df.test, df.test)
> 
>     > ## rbind Surv objects
>     > rbind(survobj, survobj)
> 
> 
> 
>     > At 06.08.2010 09:34 -0700, William Dunlap wrote:
>     >> > -----Original Message-----
>     >> > From: r-devel-bounces at r-project.org
>     >> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Nicholas
>     >> > L Crookston
>     >> > Sent: Friday, August 06, 2010 8:35 AM
>     >> > To: Michael Lachmann
>     >> > Cc: r-devel-bounces at r-project.org; r-devel at r-project.org
>     >> > Subject: Re: [Rd] rbind on data.frame that contains a column
>     >> > that is also a data.frame
>     >> >
>     >> > OK...I'll put in my 2 cents worth.
>     >> >
>     >> > It seems to me that the problem is with this line:
>     >> >
>     >> > b$a=a , where "s" is something other than a vector with
>     >> > length equal to nrow(b).
>     >> >
>     >> > I had no idea that a dataframe could hold a dataframe. It is not just
>     >> > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll
>     >> > bet other R commands fail as well.
>     >> >
>     >> > My point of view is that a dataframe is a list of vectors
>     >> > of equal length and various types (this is not exactly what the help
>     >> > page says, but it is what it suggests to me).
>     >> >
>     >> > Hum, I wonder how much code is based on the idea that a
>     >> > dataframe can hold
>     >> > a dataframe.
>     >> 
>     >> I used to think that non-vectors in data.frames were
>     >> pretty rare things but when I started looking into
>     >> the details of the modelling code I discovered that
>     >> matrices in data.frames are common.  E.g.,
>     >> > library(splines)
>     >> > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class)
>     >> $mpg
>     >> [1] "numeric"
>     >> 
>     >> $`ns(hp)`
>     >> [1] "ns"     "basis"  "matrix"
>     >> 
>     >> $`poly(disp, 2)`
>     >> [1] "poly"   "matrix"
>     >> You may not see these things because you don't call model.frame()
>     >> directly, but most modelling functions (e.g., lm() and glm())
>     >> do call it and use the grouping provided by the matrices to encode
>     >> how the columns of the design matrix are related to one another.
>     >> 
>     >> If matrices are allowed, shouldn't data.frames be allowed as well?
>     >> 
>     >> Bill Dunlap
>     >> Spotfire, TIBCO Software
>     >> wdunlap tibco.com
>     >> 
>     >> > 15 years of using R just isn't enough! But, I can
>     >> > say that not
>     >> > one
>     >> > line of code I've written expects a dataframe to hold a dataframe.
>     >> >
>     >> > > Hi,
>     >> >
>     >> > > The following was already a topic on r-help, but after
>     >> > understanding
>     >> > what is
>     >> > > going on, I think it fits better in r-devel.
>     >> >
>     >> > > The problem is this:
>     >> > > When a data.frame has another data.frame in it, rbind
>     >> > doesn't work well.
>     >> > > Here is an example:
>     >> > > --
>     >> > > > a=data.frame(x=1:10,y=1:10)
>     >> > > > b=data.frame(z=1:10)
>     >> > > > b$a=a
>     >> > > > b
>     >> > > z a.x a.y
>     >> > > 1   1   1   1
>     >> > > 2   2   2   2
>     >> > > 3   3   3   3
>     >> > > 4   4   4   4
>     >> > > 5   5   5   5
>     >> > > 6   6   6   6
>     >> > > 7   7   7   7
>     >> > > 8   8   8   8
>     >> > > 9   9   9   9
>     >> > > 10 10  10  10
>     >> > > > rbind(b,b)
>     >> > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1",
>     >> > "2", "3", "4",
>     >> >  :
>     >> > > duplicate 'row.names' are not allowed
>     >> > > In addition: Warning message:
>     >> > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?,
>     >> > ?3?, ?4?,
>     >> > ?5?,
>     >> > > ?6?, ?7?, ?8?, ?9?
>     >> > > --
>     >> >
>     >> > >
>     >> > > Looking at the code of rbind.data.frame, the error comes from the
>     >> > > lines:
>     >> > > --
>     >> > > xij <- xi[[j]]
>     >> > > if (has.dim[jj]) {
>     >> > > value[[jj]][ri, ] <- xij
>     >> > > rownames(value[[jj]])[ri] <- rownames(xij)   # <--  problem is here
>     >> > > }
>     >> > > --
>     >> > > if the rownames() line is dropped, all works well. What this line
>     >> > > tries to do is to join the rownames of internal elements of the
>     >> > > data.frames I try to rbind. So the result, in my case should have a
>     >> > > column 'a', whose rownames are the rownames of the original
>     >> > column 'a'.
>     >> > It
>     >> > > isn't totally clear to me why this is needed. When would a
>     >> > data.frame
>     >> > > have different rownames on the inside vs. the outside?
>     >> >
>     >> > > Notice also that rbind takes into account whether the
>     >> > rownames of the
>     >> > > data.frames to be joined are simply 1:n, or they are something else.
>     >> > > If they are 1:n, then the result will have rownames 1:(n+m). If not,
>     >> > > then the rownames might be kept.
>     >> >
>     >> > > I think, more consistent would be to replace the lines above with
>     >> > > something like:
>     >> > > if (has.dim[jj]) {
>     >> > > value[[jj]][ri, ] <- xij
>     >> > > rnj = rownames(value[[jj]])
>     >> > > rnj[ri] = rownames(xij)
>     >> > > rnj = make.unique(as.character(unlist(rnj)), sep = "")
>     >> > > rownames(value[[jj]]) <- rnj
>     >> > > }
>     >> >
>     >> > > In this case, the rownames of inside elements will also be
>     >> > joined, but
>     >> > > in case they overlap, they will be made unique - just as
>     >> > they are for
>     >> > > the overall result of rbind. A side effect here would be that the
>     >> > > rownames of matrices will also be made unique, which till now didn't
>     >> > > happen, and which also doesn't happen when one rbinds matrices that
>     >> > > have rownames. So it would be better to test above if we are dealing
>     >> > > with a matrix or a data.frame.
>     >> >
>     >> > > But most people don't have different rownames inside and outside.
>     >> > > Maybe it would be best to add a flag as to whether you care or don't
>     >> > > care about the rownames of internal data.frames...
>     >> >
>     >> > > But maybe data.frames aren't meant to contain other data.frames?
>     >> >
>     >> > > If instead I do
>     >> > > b=data.frame( z=1:10, a=a)
>     >> > > then rbind(b,b) works well. In this case the data.frame was
>     >> > converted to
>     >> > its
>     >> > > columns. Maybe
>     >> > > b$a = a
>     >> > > should do the same?
>     >> >
>     >> > > Michael
>     >> > > --
>     >> > > View this message in context: http://r.789695.n4.nabble.com/rbind-
>     >> > > on-data-frame-that-contains-a-column-that-is-also-a-data-frame-
>     >> > > tp2315682p2315682.html
>     >> > > Sent from the R devel mailing list archive at Nabble.com.
>     >> >
>     >> > > ______________________________________________
>     >> > > R-devel at r-project.org mailing list
>     >> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >> >       [[alternative HTML version deleted]]
>     >> >
>     >> > ______________________________________________
>     >> > R-devel at r-project.org mailing list
>     >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >> >
>     >> 
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list