[R] How to use ddply
David Winsemius
dwinsemius at comcast.net
Mon Jan 13 23:59:40 CET 2014
On Jan 13, 2014, at 1:29 PM, Amitabh Dugar wrote:
> I have never used R-help to pose a question to the R-users community; is sending this Email the right way to do so?
>
> I am trying to use the ddply function in the plyr package to accomplish the following:
> I have a data frame of the type:
>
> ticker monthend_n wgtdiff ret
> 156 AA 19990228 0.7172 -2.58
> 545 AAPL 19990228 -0.0828 -15.48
> 925 ABCW 19990228 0.0966 -7.36
> 1041 ABFS 19990228 0.1320 -8.89
> 1165 ABI 19990228 0.2355 4.61
> 1482 ABS 19990228 0.1668 -6.56
> 1563 ABT 19990228 0.1650 -0.27
> 1790 ACAT 19990228 0.1540 -13.82
> 2498 ACN 19990228 0.0000 12.15
> 2532 ACO 19990228 0.1320 8.48
> 2857 ACV 19990228 0.1540 -6.54
> 2942 ACXM 19990228 0.0000 -6.13
> 3303 ADCT 19990228 0.1035 1.73
> 3568 ADM 19990228 0.1540 0.33
> 4072 ADSK 19990228 -0.1035 -9.19
> 4672 AEH 19990228 0.1650 NA
> 4673 AEIC 19990228 0.1314 -6.95
> 4867 AEP 19990228 0.1540 -3.62
> 157 AA 19990331 0.1932 1.70
> 546 AAPL 19990331 0.0330 3.23
> 1005 ABF 19990331 0.1540 -20.51
> 1166 ABI 19990331 0.2860 8.33
> 1255 ABK 19990331 0.0966 -3.57
> 1483 ABS 19990331 0.0000 -4.50
> 1564 ABT 19990331 0.3955 1.08
> 1733 ABX 19990331 0.2340 -3.53
> 2533 ACO 19990331 0.0966 5.26
> 3304 ADCT 19990331 0.2925 17.75
> 3418 ADI 19990331 0.2688 18.70
> 3724 ADP 19990331 0.1540 -38.43
> 4514 AEE 19990331 0.1540 -1.31
> 4868 AEP 19990331 -0.0966 -4.65
>
> I am trying to generate quintile cutoff points across the distribution of tickers for every month, using the command:
>> result <- ddply(test, .(monthend_n), .fun=cut, test$wgtdiff,5)
>
> I get the message:
> Error in cut.default(piece, ...) : 'x' must be numeric
>
> I tried creating a monthly list of data frames, extracting the wgtdiff column and passing that into the cut function, but that did not work either (as below)
> pieces <- split(test,test$monthend_n)
> vectors<- lapply(pieces,"[[","wgtdiff")
> quintiles <- lapply(vectors,cut(vectors[1:2],5))
> Error in cut.default(vectors[1:2], 5) : 'x' must be numeric
>
> However, the cut function does the job correctly when I pass it only an individual month's data, as below:
> first <- pieces[[1]]
> quintiles <- cut(first$wgtdiff,5)
> levels(quintiles)
>
> What is the correct way to solve this problem?
This will deliver classification results within categories of monthend_n. You should not need to supply the data name as test$wgtdiff.
result <- ddply(test, .(monthend_n), summarise, cut(wgtdiff,breaks=5) )
>
> Thanks for your help, everyone!
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list