[R] Help with ddply to eliminate a for..loop
Marc Schwartz
marc_schwartz at me.com
Thu Aug 26 22:49:10 CEST 2010
On Aug 26, 2010, at 3:40 PM, Marc Schwartz wrote:
> On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:
>
>> I created a small example to show something that I do a lot of. "scale"
>> data by month and return a data.frame with the output. "id" represents
>> repeated observations over "time" and I want to scale the "slope"
>> variable. The "out" variable shows the output I want. My for..loop
>> does the job but is probably very slow versus other methods. ddply
>> seems ideal, but despite playing with the baseball examples quite a bit
>> I can't figure out how to get it to work with my sample dataset.
>>
>> TIA for any help, Roger
>>
>> Here is the sample code:
>>
>> dat <- data.frame(id=rep(letters[1:5],3),
>> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
>> dat
>>
>> for (i in 1:3) {
>> mat <- dat[dat$time==i, ]
>> outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
>> if (i==1) {
>> out <- outi
>> } else {
>> out <- rbind(out, outi)
>> }
>> }
>> out
>>
>> Here is the sample output:
>>
>>> dat <- data.frame(id=rep(letters[1:5],3),
>> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
>>
>>> dat
>> id time slope
>> 1 a 1 1
>> 2 b 1 2
>> 3 c 1 3
>> 4 d 1 4
>> 5 e 1 5
>> 6 a 2 6
>> 7 b 2 7
>> 8 c 2 8
>> 9 d 2 9
>> 10 e 2 10
>> 11 a 3 11
>> 12 b 3 12
>> 13 c 3 13
>> 14 d 3 14
>> 15 e 3 15
>>
>>> for (i in 1:3) {
>> + mat <- dat[dat$time==i, ]
>> + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
>> + if (i==1) {
>> + out .... [TRUNCATED]
>>
>>> out
>> mat.time mat.id slope
>> 1 1 a -1.2649111
>> 2 1 b -0.6324555
>> 3 1 c 0.0000000
>> 4 1 d 0.6324555
>> 5 1 e 1.2649111
>> 6 2 a -1.2649111
>> 7 2 b -0.6324555
>> 8 2 c 0.0000000
>> 9 2 d 0.6324555
>> 10 2 e 1.2649111
>> 11 3 a -1.2649111
>> 12 3 b -0.6324555
>> 13 3 c 0.0000000
>> 14 3 d 0.6324555
>> 15 3 e 1.2649111
>>>
>> ***************************************************************
>
>
> Roger, seems like you might want:
>
> See ?ave
>
>> cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
> id time slope slope
> 1 a 1 1 -1.2649111
> 2 b 1 2 -0.6324555
> 3 c 1 3 0.0000000
> 4 d 1 4 0.6324555
> 5 e 1 5 1.2649111
> 6 a 2 6 -1.2649111
> 7 b 2 7 -0.6324555
> 8 c 2 8 0.0000000
> 9 d 2 9 0.6324555
> 10 e 2 10 1.2649111
> 11 a 3 11 -1.2649111
> 12 b 3 12 -0.6324555
> 13 c 3 13 0.0000000
> 14 d 3 14 0.6324555
> 15 e 3 15 1.2649111
Quick fine tune, as I forgot to remove the original 'slope' column above.
> cbind(dat[, -3], slope = ave(dat$slope, list(dat$time), FUN = scale))
id time slope
1 a 1 -1.2649111
2 b 1 -0.6324555
3 c 1 0.0000000
4 d 1 0.6324555
5 e 1 1.2649111
6 a 2 -1.2649111
7 b 2 -0.6324555
8 c 2 0.0000000
9 d 2 0.6324555
10 e 2 1.2649111
11 a 3 -1.2649111
12 b 3 -0.6324555
13 c 3 0.0000000
14 d 3 0.6324555
15 e 3 1.2649111
Marc
More information about the R-help
mailing list