[R] New PLYR issue

Thu Jan 19 12:17:30 CET 2012

Hi,

thanks a lot. That was quite helpful, not only in terms of providing a 
solution to my problem, but in terms of efficiently explaining, what the 
problem is about.

On 17/01/2012 18:26, Jeff Newmiller wrote:
> Replying to old messages without including context (particularly old 
> ones) is rather bad netiquette.
>
> Thank you for at least providing a reproducible example. Now if you 
> can figure out how to read the documentation we will really make some 
> progress.
>
> Further responses below.
>
> On Tue, 17 Jan 2012, Gunnar Oehmichen wrote:
>
>>
>> Hello everyone,
>>
>> I have got the same problem, with the same error message.
>
> I wasn't able to draw a comparison between the problems, though the 
> error messages were the same.
>
>> Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP
>>
>> The plyr mailing list does not provide any help until now.
>>
>> >require(plyr)
>>
>> >c(sample(c(1:100), 50, replace=TRUE))->V1
>
> Much better to use " <- " than "->" for clarity of code (spaces and 
> direction of assignment make a difference for readability)
>
>> >c(rep( 1:5, 10))->f1 #variable to group V1
>>
>> >data.frame(cbind(V1, f1))->DF
>>
>> >str(DF)
>>
>> >ddply(DF$V1, DF$f1, "sd")
>> >ddply(.(DF$V1), .(DF$f1), "sd")
>>
>> >Error in if (empty(.data)) return(.data) :
>> missing value where TRUE/FALSE needed
>>
>> Thanks everyone,
>
> If you hand a toothpick to a mechanic you should not be surprised when 
> he tells you he cannot change a tire from your car.  You are giving a 
> vector where a data frame is needed, another vector where a name or 
> vector of names are required, and the name of a function where an 
> actual function is needed, and the function is complaining. In the 
> face of such confusion, it is not surprising that people were unable 
> to figure out where to start setting you straight.  However, in return 
> for your reproducible example I will give it a go.
>
> A basic unifying concept for the plyr package is that the name of the 
> function tells you something about what needs to go in, and what will 
> come out. "ddply" starts with a "d" so it expects a data frame as 
> input, and because the second letter is also a "d" it will yield a 
> data frame result when it is done.
>
> Argument 1:
>
> DF$V1 is a vector. It happens to be the the column named V1 in the 
> data frame DF.  To specify a data frame, don't apply operators to it, 
> just write the name of the data frame DF.
>
> Argument 2:
>
> This argument tells ddply what the name of the grouping columns are. 
> Do not actually give the grouping columns to ddply (which $ does).  I 
> have found that while the .() function seems cleaner, I find it 
> clearer to use a vector of strings ... in this case, there is only one 
> grouping column, so I would forego the usual c() concatenator and just 
> give it "f1".
>
> Argument 3:
>
> This argument is supposed to be a function that will take a data frame 
> (first d) and yield a data frame (second d) for one group of rows.  
> ddply will take care of stacking them as a single data frame for the 
> final result.  You have given ddply the name (first error) of a 
> function that takes a vector and returns a scalar (wrong type of 
> function is error two).
>
> The correct documentation for all of these arguments can be found by 
> typing ?ddply at the R command line (after you have loaded plyr).  It 
> looks like you have been reading the documentation for ?aggregate or 
> ?summaryBy (doBy package) and trying to use that to inform your use of 
> ddply.
>
> So the actual call should be:
>
>> ddply(DF,"f1",function(df){data.frame(sdV1=sd(df$V1))})
>   f1     sdV1
> 1  1 19.93016
> 2  2 35.96356
> 3  3 33.30349
> 4  4 26.62831
> 5  5 25.03087
>
> In general, to add more simultaneous calculations, you add more 
> columns to the data frame produced by your function that does the 
> calculations. If you want to give it a function name, don't put it in 
> quotes:
>
>> myfunction <- function(df){
> +  data.frame(sdV1=sd(df$V1),meanV1=mean(df$V1))
> + }
>> ddply(DF,"f1",myfunction)
>   f1     sdV1 meanV1
> 1  1 19.93016   49.1
> 2  2 35.96356   45.6
> 3  3 33.30349   44.7
> 4  4 26.62831   72.2
> 5  5 25.03087   30.1
>
> Note that although ddply does a lot for you, it doesn't reproduce all 
> of your calculations on all of the data columns like summaryBy does... 
> you have to explicitly create every calculated column in your function.
>
> --------------------------------------------------------------------------- 
>
> Jeff Newmiller                        The     .....       .....  Go 
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live 
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  
> rocks...1k
> --------------------------------------------------------------------------- 
>

-- 
--------------------------------------------------------------------------------------------

Gunnar Oehmichen / Diploma Student of Environmental Sciences
Department of Conservation Biology

Helmholtz-Zentrum für Umweltforschung GmbH - UFZ
Helmholtz Centre for Environmental Research GmbH - UFZ	
Permoserstraße 15 / 04318 Leipzig / Germany
Telefon +49 341 235 1269 / Fax +49 341 235 1468
max.mustermann at ufz.de / www.ufz.de

Sitz der Gesellschaft: Leipzig
Registergericht: Amtsgericht Leipzig, Handelsregister Nr. B 4703
Vorsitzender des Aufsichtsrats: MinDirig Wilfried Kraus
Wissenschaftlicher Geschäftsführer: Prof. Dr. Georg Teutsch
Administrative Geschäftsführerin: Dr. Heike Graßmann