[R] as.data.frame doesn't set col.names
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Oct 25 15:54:11 CEST 2017
On 25/10/2017 8:15 AM, Eric Berger wrote:
> Hi Peter,
> Thanks for contributing such a great answer. Can you please provide a
> pointer to the documentation where it explains why dd$B <- s and dd["B"] <-
> s have such different behavior?
See Introduction to R, sections 6.1 (Lists) and 6.3 (Data frames). Note
that dd$B is nearly the same as dd[["B"]], not dd["B"].
Duncan Murdoch
>
> (I am perfectly happy if you write the explanation but if it saves you time
> to point to some reference that works fine for me.)
>
> Regards,
> Eric
>
>
> On Wed, Oct 25, 2017 at 2:27 PM, Peter Dalgaard <pdalgd at gmail.com> wrote:
>
>>
>>> On 24 Oct 2017, at 22:45 , David L Carlson <dcarlson at tamu.edu> wrote:
>>>
>>> You left out all the most important bits of information. What is yo? Are
>> you trying to assign a data frame to a single column in another data frame?
>> Printing head(samples) tells us nothing about what data types you have,
>> especially if the things that look like text are really factors that were
>> created when you used one of the read.*() functions. Use str(samples) to
>> see what you are dealing with.
>>
>> Actually, I think there is enough information to diagnose this. The main
>> issue is as you point out, assignment of an entire data frame to a column
>> of another data frame:
>>
>>> l <- letters[1:5]
>>> s <- as.data.frame(sapply(l,toupper))
>>> dput(s)
>> structure(list(`sapply(l, toupper)` = structure(1:5, .Label = c("A",
>> "B", "C", "D", "E"), class = "factor")), .Names = "sapply(l, toupper)",
>> row.names = c("a",
>> "b", "c", "d", "e"), class = "data.frame")
>>
>> (incidentally, setting col.names has no effect on this; notice that it is
>> only documented as an argument to "list" and "matrix" methods, and sapply()
>> returns a vector)
>>
>> Now, if we do this:
>>
>>> dd <- data.frame(A=l)
>>> dd$B <- s
>>
>> we end up with a data frame whose B "column" is another data frame
>>
>>> dput(dd)
>> structure(list(A = structure(1:5, .Label = c("a", "b", "c", "d",
>> "e"), class = "factor"), B = structure(list(`sapply(l, toupper)` =
>> structure(1:5, .Label = c("A",
>> "B", "C", "D", "E"), class = "factor")), .Names = "sapply(l, toupper)",
>> row.names = c("a",
>> "b", "c", "d", "e"), class = "data.frame")), .Names = c("A",
>> "B"), row.names = c(NA, -5L), class = "data.frame")
>>
>> in printing such data frames, the inner frame "wins" the column names,
>> which is sensible if you consider what would happen if it had more than one
>> column:
>>
>>> dd
>> A sapply(l, toupper)
>> 1 a A
>> 2 b B
>> 3 c C
>> 4 d D
>> 5 e E
>>
>> To get the effect that Ed probably expected, do
>>
>>> dd <- data.frame(A=l)
>>> dd["B"] <- s
>>> dd
>> A B
>> 1 a A
>> 2 b B
>> 3 c C
>> 4 d D
>> 5 e E
>>
>> (and notice that single-bracket indexing is crucial here)
>>
>> -pd
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list