[Rd] rbind on data.frame that contains a column that is also a data.frame
Martin Maechler
maechler at stat.math.ethz.ch
Sat Aug 7 13:03:06 CEST 2010
>>>>> Heinz Tuechler <tuechler at gmx.at>
>>>>> on Sat, 07 Aug 2010 01:01:24 +0100 writes:
> Also Surv objects are matrices and they share the same problem when
> rbind-ing data.frames.
> If contained in a data.frame, Surv objects loose their class after
> rbind and therefore do not more represent Surv objects afterwards.
> Using rbind with Surv objects outside of data.frames shows a similar
> problem, but not the same column names.
> In conclusion, yes, matrices are common in data.frames, but not
> without problems.
My understanding (> 20 yr long S and R experience) has been that
a dataframe definitely can have matrix-like "components",
and as Bill Dunlap (with equal S & R experience) has just
explained, that's actually more common than you have thought.
To have *data frame*s instead of simple matrices, should be much
less common, I'm not sure if it's a good idea.
But getting back to 'matrices',
I think they should work "without problems", at least for basic
R operations such as rbind().
I don't have time to analyze the Surv - example below,
but at the moment think, that we'd be interested in
"fixing" the problems..
Martin Maechler, ETH Zurich
> Heinz
> ## example
> library(survival)
> ## create example data
> starttime <- rep(0,5)
> stoptime <- 1:5
> event <- c(1,0,1,1,1)
> group <- c(1,1,1,2,2)
> ## build Surv object
> survobj <- Surv(starttime, stoptime, event)
> ## build data.frame with Surv object
> df.test <- data.frame(survobj, group)
> df.test
> ## rbind data.frames
> rbind(df.test, df.test)
> ## rbind Surv objects
> rbind(survobj, survobj)
> At 06.08.2010 09:34 -0700, William Dunlap wrote:
>> > -----Original Message-----
>> > From: r-devel-bounces at r-project.org
>> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Nicholas
>> > L Crookston
>> > Sent: Friday, August 06, 2010 8:35 AM
>> > To: Michael Lachmann
>> > Cc: r-devel-bounces at r-project.org; r-devel at r-project.org
>> > Subject: Re: [Rd] rbind on data.frame that contains a column
>> > that is also a data.frame
>> >
>> > OK...I'll put in my 2 cents worth.
>> >
>> > It seems to me that the problem is with this line:
>> >
>> > b$a=a , where "s" is something other than a vector with
>> > length equal to nrow(b).
>> >
>> > I had no idea that a dataframe could hold a dataframe. It is not just
>> > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll
>> > bet other R commands fail as well.
>> >
>> > My point of view is that a dataframe is a list of vectors
>> > of equal length and various types (this is not exactly what the help
>> > page says, but it is what it suggests to me).
>> >
>> > Hum, I wonder how much code is based on the idea that a
>> > dataframe can hold
>> > a dataframe.
>>
>> I used to think that non-vectors in data.frames were
>> pretty rare things but when I started looking into
>> the details of the modelling code I discovered that
>> matrices in data.frames are common. E.g.,
>> > library(splines)
>> > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class)
>> $mpg
>> [1] "numeric"
>>
>> $`ns(hp)`
>> [1] "ns" "basis" "matrix"
>>
>> $`poly(disp, 2)`
>> [1] "poly" "matrix"
>> You may not see these things because you don't call model.frame()
>> directly, but most modelling functions (e.g., lm() and glm())
>> do call it and use the grouping provided by the matrices to encode
>> how the columns of the design matrix are related to one another.
>>
>> If matrices are allowed, shouldn't data.frames be allowed as well?
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> > 15 years of using R just isn't enough! But, I can
>> > say that not
>> > one
>> > line of code I've written expects a dataframe to hold a dataframe.
>> >
>> > > Hi,
>> >
>> > > The following was already a topic on r-help, but after
>> > understanding
>> > what is
>> > > going on, I think it fits better in r-devel.
>> >
>> > > The problem is this:
>> > > When a data.frame has another data.frame in it, rbind
>> > doesn't work well.
>> > > Here is an example:
>> > > --
>> > > > a=data.frame(x=1:10,y=1:10)
>> > > > b=data.frame(z=1:10)
>> > > > b$a=a
>> > > > b
>> > > z a.x a.y
>> > > 1 1 1 1
>> > > 2 2 2 2
>> > > 3 3 3 3
>> > > 4 4 4 4
>> > > 5 5 5 5
>> > > 6 6 6 6
>> > > 7 7 7 7
>> > > 8 8 8 8
>> > > 9 9 9 9
>> > > 10 10 10 10
>> > > > rbind(b,b)
>> > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1",
>> > "2", "3", "4",
>> > :
>> > > duplicate 'row.names' are not allowed
>> > > In addition: Warning message:
>> > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?,
>> > ?3?, ?4?,
>> > ?5?,
>> > > ?6?, ?7?, ?8?, ?9?
>> > > --
>> >
>> > >
>> > > Looking at the code of rbind.data.frame, the error comes from the
>> > > lines:
>> > > --
>> > > xij <- xi[[j]]
>> > > if (has.dim[jj]) {
>> > > value[[jj]][ri, ] <- xij
>> > > rownames(value[[jj]])[ri] <- rownames(xij) # <-- problem is here
>> > > }
>> > > --
>> > > if the rownames() line is dropped, all works well. What this line
>> > > tries to do is to join the rownames of internal elements of the
>> > > data.frames I try to rbind. So the result, in my case should have a
>> > > column 'a', whose rownames are the rownames of the original
>> > column 'a'.
>> > It
>> > > isn't totally clear to me why this is needed. When would a
>> > data.frame
>> > > have different rownames on the inside vs. the outside?
>> >
>> > > Notice also that rbind takes into account whether the
>> > rownames of the
>> > > data.frames to be joined are simply 1:n, or they are something else.
>> > > If they are 1:n, then the result will have rownames 1:(n+m). If not,
>> > > then the rownames might be kept.
>> >
>> > > I think, more consistent would be to replace the lines above with
>> > > something like:
>> > > if (has.dim[jj]) {
>> > > value[[jj]][ri, ] <- xij
>> > > rnj = rownames(value[[jj]])
>> > > rnj[ri] = rownames(xij)
>> > > rnj = make.unique(as.character(unlist(rnj)), sep = "")
>> > > rownames(value[[jj]]) <- rnj
>> > > }
>> >
>> > > In this case, the rownames of inside elements will also be
>> > joined, but
>> > > in case they overlap, they will be made unique - just as
>> > they are for
>> > > the overall result of rbind. A side effect here would be that the
>> > > rownames of matrices will also be made unique, which till now didn't
>> > > happen, and which also doesn't happen when one rbinds matrices that
>> > > have rownames. So it would be better to test above if we are dealing
>> > > with a matrix or a data.frame.
>> >
>> > > But most people don't have different rownames inside and outside.
>> > > Maybe it would be best to add a flag as to whether you care or don't
>> > > care about the rownames of internal data.frames...
>> >
>> > > But maybe data.frames aren't meant to contain other data.frames?
>> >
>> > > If instead I do
>> > > b=data.frame( z=1:10, a=a)
>> > > then rbind(b,b) works well. In this case the data.frame was
>> > converted to
>> > its
>> > > columns. Maybe
>> > > b$a = a
>> > > should do the same?
>> >
>> > > Michael
>> > > --
>> > > View this message in context: http://r.789695.n4.nabble.com/rbind-
>> > > on-data-frame-that-contains-a-column-that-is-also-a-data-frame-
>> > > tp2315682p2315682.html
>> > > Sent from the R devel mailing list archive at Nabble.com.
>> >
>> > > ______________________________________________
>> > > R-devel at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list