[R] How to use ddply

David Winsemius dwinsemius at comcast.net
Mon Jan 13 23:59:40 CET 2014


On Jan 13, 2014, at 1:29 PM, Amitabh Dugar wrote:

> I have never used R-help to pose a question to the R-users community; is sending this Email the right way to do so?
> 
> I am trying to use the ddply function in the plyr package to accomplish the following:
> I have a data frame of the type:
> 
>      ticker monthend_n wgtdiff    ret
> 156      AA   19990228  0.7172  -2.58
> 545    AAPL   19990228 -0.0828 -15.48
> 925    ABCW   19990228  0.0966  -7.36
> 1041   ABFS   19990228  0.1320  -8.89
> 1165    ABI   19990228  0.2355   4.61
> 1482    ABS   19990228  0.1668  -6.56
> 1563    ABT   19990228  0.1650  -0.27
> 1790   ACAT   19990228  0.1540 -13.82
> 2498    ACN   19990228  0.0000  12.15
> 2532    ACO   19990228  0.1320   8.48
> 2857    ACV   19990228  0.1540  -6.54
> 2942   ACXM   19990228  0.0000  -6.13
> 3303   ADCT   19990228  0.1035   1.73
> 3568    ADM   19990228  0.1540   0.33
> 4072   ADSK   19990228 -0.1035  -9.19
> 4672    AEH   19990228  0.1650     NA
> 4673   AEIC   19990228  0.1314  -6.95
> 4867    AEP   19990228  0.1540  -3.62
> 157      AA   19990331  0.1932   1.70
> 546    AAPL   19990331  0.0330   3.23
> 1005    ABF   19990331  0.1540 -20.51
> 1166    ABI   19990331  0.2860   8.33
> 1255    ABK   19990331  0.0966  -3.57
> 1483    ABS   19990331  0.0000  -4.50
> 1564    ABT   19990331  0.3955   1.08
> 1733    ABX   19990331  0.2340  -3.53
> 2533    ACO   19990331  0.0966   5.26
> 3304   ADCT   19990331  0.2925  17.75
> 3418    ADI   19990331  0.2688  18.70
> 3724    ADP   19990331  0.1540 -38.43
> 4514    AEE   19990331  0.1540  -1.31
> 4868    AEP   19990331 -0.0966  -4.65
> 
> I am trying to generate quintile cutoff points across the distribution of tickers for every month, using the command:
>> result <- ddply(test, .(monthend_n), .fun=cut, test$wgtdiff,5)
> 
> I get the message:
> Error in cut.default(piece, ...) : 'x' must be numeric
> 
> I tried creating a monthly list of data frames, extracting the wgtdiff column and passing that into the cut function, but that did not work either (as below)
> pieces <- split(test,test$monthend_n)
> vectors<- lapply(pieces,"[[","wgtdiff")
> quintiles <- lapply(vectors,cut(vectors[1:2],5))
> Error in cut.default(vectors[1:2], 5) : 'x' must be numeric
> 
> However, the cut function does the job correctly when I pass it only an individual month's data, as below:
> first <- pieces[[1]]
> quintiles <- cut(first$wgtdiff,5)
> levels(quintiles)
> 
> What is the correct way to solve this problem?

This will deliver classification results within categories of monthend_n. You should not need to supply the data name as test$wgtdiff.


result <- ddply(test, .(monthend_n), summarise, cut(wgtdiff,breaks=5) )


> 
> Thanks for your help, everyone!
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list