[R] as.data.frame doesn't set col.names
Eric Berger
ericjberger at gmail.com
Wed Oct 25 14:15:49 CEST 2017
Hi Peter,
Thanks for contributing such a great answer. Can you please provide a
pointer to the documentation where it explains why dd$B <- s and dd["B"] <-
s have such different behavior?
(I am perfectly happy if you write the explanation but if it saves you time
to point to some reference that works fine for me.)
Regards,
Eric
On Wed, Oct 25, 2017 at 2:27 PM, Peter Dalgaard <pdalgd at gmail.com> wrote:
>
> > On 24 Oct 2017, at 22:45 , David L Carlson <dcarlson at tamu.edu> wrote:
> >
> > You left out all the most important bits of information. What is yo? Are
> you trying to assign a data frame to a single column in another data frame?
> Printing head(samples) tells us nothing about what data types you have,
> especially if the things that look like text are really factors that were
> created when you used one of the read.*() functions. Use str(samples) to
> see what you are dealing with.
>
> Actually, I think there is enough information to diagnose this. The main
> issue is as you point out, assignment of an entire data frame to a column
> of another data frame:
>
> > l <- letters[1:5]
> > s <- as.data.frame(sapply(l,toupper))
> > dput(s)
> structure(list(`sapply(l, toupper)` = structure(1:5, .Label = c("A",
> "B", "C", "D", "E"), class = "factor")), .Names = "sapply(l, toupper)",
> row.names = c("a",
> "b", "c", "d", "e"), class = "data.frame")
>
> (incidentally, setting col.names has no effect on this; notice that it is
> only documented as an argument to "list" and "matrix" methods, and sapply()
> returns a vector)
>
> Now, if we do this:
>
> > dd <- data.frame(A=l)
> > dd$B <- s
>
> we end up with a data frame whose B "column" is another data frame
>
> > dput(dd)
> structure(list(A = structure(1:5, .Label = c("a", "b", "c", "d",
> "e"), class = "factor"), B = structure(list(`sapply(l, toupper)` =
> structure(1:5, .Label = c("A",
> "B", "C", "D", "E"), class = "factor")), .Names = "sapply(l, toupper)",
> row.names = c("a",
> "b", "c", "d", "e"), class = "data.frame")), .Names = c("A",
> "B"), row.names = c(NA, -5L), class = "data.frame")
>
> in printing such data frames, the inner frame "wins" the column names,
> which is sensible if you consider what would happen if it had more than one
> column:
>
> > dd
> A sapply(l, toupper)
> 1 a A
> 2 b B
> 3 c C
> 4 d D
> 5 e E
>
> To get the effect that Ed probably expected, do
>
> > dd <- data.frame(A=l)
> > dd["B"] <- s
> > dd
> A B
> 1 a A
> 2 b B
> 3 c C
> 4 d D
> 5 e E
>
> (and notice that single-bracket indexing is crucial here)
>
> -pd
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list