[R] dplyr: producing a good old data frame

Hadley Wickham h.wickham at gmail.com
Wed Feb 25 14:50:53 CET 2015


Hi John,

Just printing the result gives a good indication where the problem lies:

> frm %>% rowwise() %>% do(MM=max(as.numeric(.)))
Source: local data frame [6 x 1]
Groups: <by row>

        MM
1 <dbl[1]>
2 <dbl[1]>
3 <dbl[1]>
4 <dbl[1]>
5 <dbl[1]>
6 <dbl[1]>

do() is designed to produce scalars (e.g. a linear model), not
vectors, so it doesn't join the results back into a single vector. You
can either fix this yourself with unlist(), or use tidyr::unnest()
which will also handle vectors with length > 1.

Hadley

On Mon, Feb 23, 2015 at 2:54 PM, John Posner <john.posner at mjbiostat.com> wrote:
> I'm using the dplyr package to perform one-row-at-a-time processing of a data frame:
>
>> rnd6 = function() sample(1:300, 6)
>> frm = data.frame(AA=rnd6(), BB=rnd6(), CC=rnd6())
>
>> frm
>    AA  BB  CC
> 1 123  50  45
> 2  12  30 231
> 3 127 147 100
> 4 133  32 129
> 5  66 235  71
> 6  38 264 261
>
> The interface is nice and straightforward:
>
>> library(dplyr)
>> dplyr_result = frm %>% rowwise() %>% do(MM=max(as.numeric(.)))
>
> I've gotten used to the fact that dplyr_result is not a good old "vanilla" data frame. The as.data.frame() function *seems* to do the trick:
>
>> dplyr_result_2 = as.data.frame(dplyr_result)
>> dplyr_result_2
>    MM
> 1 123
> 2 231
> 3 147
> 4 133
> 5 235
> 6 264
>
> ... but there's trouble ahead:
>
>> mean(dplyr_result_2$MM)
> [1] NA
> Warning message:
> In mean.default(dplyr_result_2$MM) :
>   argument is not numeric or logical: returning NA
>
> I need to enlist unlist() to get me to my destination:
>
>> mean(unlist(dplyr_result_2$MM))
> [1] 188.8333
>
> [NOTE: dplyr's as_data_frame() function does a better job than as.data.frame() of indicating that I was headed for trouble. ]
>
> By contrast, the plyr package's adply() function *does* produce a vanilla data frame:
>
>  > library(plyr)
>> plyr_result = adply(frm, .margins=1, function(onerowfrm) max(as.numeric(onerowfrm[1,])))
>> mean(plyr_result$V1)
> [1] 188.8333
>
> Is there a good reason for dplyr to require the extra processing? My (naïve ?) recommendation would be to have as_data_frame() produce a vanilla data frame.
>
> Tx,
> John
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/



More information about the R-help mailing list