[R] New PLYR issue

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Jan 17 18:26:21 CET 2012


Replying to old messages without including context (particularly old ones) 
is rather bad netiquette.

Thank you for at least providing a reproducible example. Now if you can 
figure out how to read the documentation we will really make some 
progress.

Further responses below.

On Tue, 17 Jan 2012, Gunnar Oehmichen wrote:

> 
> Hello everyone,
> 
> I have got the same problem, with the same error message.

I wasn't able to draw a comparison between the problems, though the error 
messages were the same.

> Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP
> 
> The plyr mailing list does not provide any help until now.
> 
> >require(plyr)
> 
> >c(sample(c(1:100), 50, replace=TRUE))->V1

Much better to use " <- " than "->" for clarity of code (spaces and 
direction of assignment make a difference for readability)

> >c(rep( 1:5, 10))->f1 #variable to group V1
> 
> >data.frame(cbind(V1, f1))->DF
> 
> >str(DF)
> 
> >ddply(DF$V1, DF$f1, "sd")
> >ddply(.(DF$V1), .(DF$f1), "sd")
> 
> >Error in if (empty(.data)) return(.data) :
> missing value where TRUE/FALSE needed
> 
> Thanks everyone,

If you hand a toothpick to a mechanic you should not be surprised when he 
tells you he cannot change a tire from your car.  You are giving a vector 
where a data frame is needed, another vector where a name or vector of 
names are required, and the name of a function where an actual function 
is needed, and the function is complaining. In the face of such confusion, 
it is not surprising that people were unable to figure out where to start 
setting you straight.  However, in return for your reproducible example I 
will give it a go.

A basic unifying concept for the plyr package is that the name of the 
function tells you something about what needs to go in, and what will come 
out. "ddply" starts with a "d" so it expects a data frame as input, and 
because the second letter is also a "d" it will yield a data frame result 
when it is done.

Argument 1:

DF$V1 is a vector. It happens to be the the column named V1 in the data 
frame DF.  To specify a data frame, don't apply operators to it, just 
write the name of the data frame DF.

Argument 2:

This argument tells ddply what the name of the grouping columns are. Do 
not actually give the grouping columns to ddply (which $ does).  I have 
found that while the .() function seems cleaner, I find it clearer to use 
a vector of strings ... in this case, there is only one grouping column, 
so I would forego the usual c() concatenator and just give it "f1".

Argument 3:

This argument is supposed to be a function that will take a data frame 
(first d) and yield a data frame (second d) for one group of rows.  ddply 
will take care of stacking them as a single data frame for the final 
result.  You have given ddply the name (first error) of a function that 
takes a vector and returns a scalar (wrong type of function is error two).

The correct documentation for all of these arguments can be found by 
typing ?ddply at the R command line (after you have loaded plyr).  It 
looks like you have been reading the documentation for ?aggregate or 
?summaryBy (doBy package) and trying to use that to inform your use of 
ddply.

So the actual call should be:

> ddply(DF,"f1",function(df){data.frame(sdV1=sd(df$V1))})
   f1     sdV1
1  1 19.93016
2  2 35.96356
3  3 33.30349
4  4 26.62831
5  5 25.03087

In general, to add more simultaneous calculations, you add more columns to 
the data frame produced by your function that does the calculations. If 
you want to give it a function name, don't put it in quotes:

> myfunction <- function(df){
+  data.frame(sdV1=sd(df$V1),meanV1=mean(df$V1))
+ }
> ddply(DF,"f1",myfunction)
   f1     sdV1 meanV1
1  1 19.93016   49.1
2  2 35.96356   45.6
3  3 33.30349   44.7
4  4 26.62831   72.2
5  5 25.03087   30.1

Note that although ddply does a lot for you, it doesn't reproduce all of 
your calculations on all of the data columns like summaryBy does... you 
have to explicitly create every calculated column in your function.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list