[R] New PLYR issue
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Tue Jan 17 18:26:21 CET 2012
Replying to old messages without including context (particularly old ones)
is rather bad netiquette.
Thank you for at least providing a reproducible example. Now if you can
figure out how to read the documentation we will really make some
progress.
Further responses below.
On Tue, 17 Jan 2012, Gunnar Oehmichen wrote:
>
> Hello everyone,
>
> I have got the same problem, with the same error message.
I wasn't able to draw a comparison between the problems, though the error
messages were the same.
> Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP
>
> The plyr mailing list does not provide any help until now.
>
> >require(plyr)
>
> >c(sample(c(1:100), 50, replace=TRUE))->V1
Much better to use " <- " than "->" for clarity of code (spaces and
direction of assignment make a difference for readability)
> >c(rep( 1:5, 10))->f1 #variable to group V1
>
> >data.frame(cbind(V1, f1))->DF
>
> >str(DF)
>
> >ddply(DF$V1, DF$f1, "sd")
> >ddply(.(DF$V1), .(DF$f1), "sd")
>
> >Error in if (empty(.data)) return(.data) :
> missing value where TRUE/FALSE needed
>
> Thanks everyone,
If you hand a toothpick to a mechanic you should not be surprised when he
tells you he cannot change a tire from your car. You are giving a vector
where a data frame is needed, another vector where a name or vector of
names are required, and the name of a function where an actual function
is needed, and the function is complaining. In the face of such confusion,
it is not surprising that people were unable to figure out where to start
setting you straight. However, in return for your reproducible example I
will give it a go.
A basic unifying concept for the plyr package is that the name of the
function tells you something about what needs to go in, and what will come
out. "ddply" starts with a "d" so it expects a data frame as input, and
because the second letter is also a "d" it will yield a data frame result
when it is done.
Argument 1:
DF$V1 is a vector. It happens to be the the column named V1 in the data
frame DF. To specify a data frame, don't apply operators to it, just
write the name of the data frame DF.
Argument 2:
This argument tells ddply what the name of the grouping columns are. Do
not actually give the grouping columns to ddply (which $ does). I have
found that while the .() function seems cleaner, I find it clearer to use
a vector of strings ... in this case, there is only one grouping column,
so I would forego the usual c() concatenator and just give it "f1".
Argument 3:
This argument is supposed to be a function that will take a data frame
(first d) and yield a data frame (second d) for one group of rows. ddply
will take care of stacking them as a single data frame for the final
result. You have given ddply the name (first error) of a function that
takes a vector and returns a scalar (wrong type of function is error two).
The correct documentation for all of these arguments can be found by
typing ?ddply at the R command line (after you have loaded plyr). It
looks like you have been reading the documentation for ?aggregate or
?summaryBy (doBy package) and trying to use that to inform your use of
ddply.
So the actual call should be:
> ddply(DF,"f1",function(df){data.frame(sdV1=sd(df$V1))})
f1 sdV1
1 1 19.93016
2 2 35.96356
3 3 33.30349
4 4 26.62831
5 5 25.03087
In general, to add more simultaneous calculations, you add more columns to
the data frame produced by your function that does the calculations. If
you want to give it a function name, don't put it in quotes:
> myfunction <- function(df){
+ data.frame(sdV1=sd(df$V1),meanV1=mean(df$V1))
+ }
> ddply(DF,"f1",myfunction)
f1 sdV1 meanV1
1 1 19.93016 49.1
2 2 35.96356 45.6
3 3 33.30349 44.7
4 4 26.62831 72.2
5 5 25.03087 30.1
Note that although ddply does a lot for you, it doesn't reproduce all of
your calculations on all of the data columns like summaryBy does... you
have to explicitly create every calculated column in your function.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list