[R] New PLYR issue
Gunnar Oehmichen
gunnar.oehmichen at ufz.de
Thu Jan 19 12:17:30 CET 2012
Hi,
thanks a lot. That was quite helpful, not only in terms of providing a
solution to my problem, but in terms of efficiently explaining, what the
problem is about.
On 17/01/2012 18:26, Jeff Newmiller wrote:
> Replying to old messages without including context (particularly old
> ones) is rather bad netiquette.
>
> Thank you for at least providing a reproducible example. Now if you
> can figure out how to read the documentation we will really make some
> progress.
>
> Further responses below.
>
> On Tue, 17 Jan 2012, Gunnar Oehmichen wrote:
>
>>
>> Hello everyone,
>>
>> I have got the same problem, with the same error message.
>
> I wasn't able to draw a comparison between the problems, though the
> error messages were the same.
>
>> Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP
>>
>> The plyr mailing list does not provide any help until now.
>>
>> >require(plyr)
>>
>> >c(sample(c(1:100), 50, replace=TRUE))->V1
>
> Much better to use " <- " than "->" for clarity of code (spaces and
> direction of assignment make a difference for readability)
>
>> >c(rep( 1:5, 10))->f1 #variable to group V1
>>
>> >data.frame(cbind(V1, f1))->DF
>>
>> >str(DF)
>>
>> >ddply(DF$V1, DF$f1, "sd")
>> >ddply(.(DF$V1), .(DF$f1), "sd")
>>
>> >Error in if (empty(.data)) return(.data) :
>> missing value where TRUE/FALSE needed
>>
>> Thanks everyone,
>
> If you hand a toothpick to a mechanic you should not be surprised when
> he tells you he cannot change a tire from your car. You are giving a
> vector where a data frame is needed, another vector where a name or
> vector of names are required, and the name of a function where an
> actual function is needed, and the function is complaining. In the
> face of such confusion, it is not surprising that people were unable
> to figure out where to start setting you straight. However, in return
> for your reproducible example I will give it a go.
>
> A basic unifying concept for the plyr package is that the name of the
> function tells you something about what needs to go in, and what will
> come out. "ddply" starts with a "d" so it expects a data frame as
> input, and because the second letter is also a "d" it will yield a
> data frame result when it is done.
>
> Argument 1:
>
> DF$V1 is a vector. It happens to be the the column named V1 in the
> data frame DF. To specify a data frame, don't apply operators to it,
> just write the name of the data frame DF.
>
> Argument 2:
>
> This argument tells ddply what the name of the grouping columns are.
> Do not actually give the grouping columns to ddply (which $ does). I
> have found that while the .() function seems cleaner, I find it
> clearer to use a vector of strings ... in this case, there is only one
> grouping column, so I would forego the usual c() concatenator and just
> give it "f1".
>
> Argument 3:
>
> This argument is supposed to be a function that will take a data frame
> (first d) and yield a data frame (second d) for one group of rows.
> ddply will take care of stacking them as a single data frame for the
> final result. You have given ddply the name (first error) of a
> function that takes a vector and returns a scalar (wrong type of
> function is error two).
>
> The correct documentation for all of these arguments can be found by
> typing ?ddply at the R command line (after you have loaded plyr). It
> looks like you have been reading the documentation for ?aggregate or
> ?summaryBy (doBy package) and trying to use that to inform your use of
> ddply.
>
> So the actual call should be:
>
>> ddply(DF,"f1",function(df){data.frame(sdV1=sd(df$V1))})
> f1 sdV1
> 1 1 19.93016
> 2 2 35.96356
> 3 3 33.30349
> 4 4 26.62831
> 5 5 25.03087
>
> In general, to add more simultaneous calculations, you add more
> columns to the data frame produced by your function that does the
> calculations. If you want to give it a function name, don't put it in
> quotes:
>
>> myfunction <- function(df){
> + data.frame(sdV1=sd(df$V1),meanV1=mean(df$V1))
> + }
>> ddply(DF,"f1",myfunction)
> f1 sdV1 meanV1
> 1 1 19.93016 49.1
> 2 2 35.96356 45.6
> 3 3 33.30349 44.7
> 4 4 26.62831 72.2
> 5 5 25.03087 30.1
>
> Note that although ddply does a lot for you, it doesn't reproduce all
> of your calculations on all of the data columns like summaryBy does...
> you have to explicitly create every calculated column in your function.
>
> ---------------------------------------------------------------------------
>
> Jeff Newmiller The ..... ..... Go
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#.
> rocks...1k
> ---------------------------------------------------------------------------
>
--
--------------------------------------------------------------------------------------------
Gunnar Oehmichen / Diploma Student of Environmental Sciences
Department of Conservation Biology
Helmholtz-Zentrum für Umweltforschung GmbH - UFZ
Helmholtz Centre for Environmental Research GmbH - UFZ
Permoserstraße 15 / 04318 Leipzig / Germany
Telefon +49 341 235 1269 / Fax +49 341 235 1468
max.mustermann at ufz.de / www.ufz.de
Sitz der Gesellschaft: Leipzig
Registergericht: Amtsgericht Leipzig, Handelsregister Nr. B 4703
Vorsitzender des Aufsichtsrats: MinDirig Wilfried Kraus
Wissenschaftlicher Geschäftsführer: Prof. Dr. Georg Teutsch
Administrative Geschäftsführerin: Dr. Heike Graßmann
More information about the R-help
mailing list