[R] dplyr/summarize does not create a true data frame
John Posner
john.posner at MJBIOSTAT.COM
Fri Nov 21 18:10:16 CET 2014
I got an error when trying to extract a 1-column subset of a data frame (called "my.output") created by dplyr/summarize. The ncol() function says that my.output has 4 columns, but "my.output[4]" fails. Note that converting my.output using as.data.frame() makes for a happy ending.
Is this the intended behavior of dplyr?
Tx,
John
> library(dplyr)
> # set up data frame
> rows = 100
> repcnt = 50
> sexes = c("Female", "Male")
> heights = c("Med", "Short", "Tall")
> frm = data.frame(
+ Id = paste("P", sprintf("%04d", 1:rows), sep=""),
+ Sex = sample(rep(sexes, repcnt), rows, replace=T),
+ Height = sample(rep(heights, repcnt), rows, replace=T),
+ V1 = round(runif(rows)*25, 2) + 50,
+ V2 = round(runif(rows)*1000, 2) + 50,
+ V3 = round(runif(rows)*350, 2) - 175
+ )
>
> # use dplyr/summarize to create data frame
> my.output = frm %>%
+ group_by(Sex, Height) %>%
+ summarize(V1sum=sum(V1), V2sum=sum(V2))
> # work with columns in the output data frame
> ncol(my.output)
[1] 4
> my.output[1]
Source: local data frame [6 x 1]
Groups: Sex
Sex
1 Female
2 Female
3 Female
4 Male
5 Male
6 Male
> my.output[4]
Error in eval(expr, envir, enclos) : index out of bounds ######## ERROR HERE
> as.data.frame(my.output)[4]
V2sum
1 12427.97
2 8449.82
3 8610.97
4 7249.20
5 12616.91
6 10372.15
>
More information about the R-help
mailing list