[R] Help with ddply to eliminate a for..loop

Marc Schwartz marc_schwartz at me.com
Thu Aug 26 22:49:10 CEST 2010


On Aug 26, 2010, at 3:40 PM, Marc Schwartz wrote:

> On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:
> 
>> I created a small example to show something that I do a lot of.  "scale"
>> data by month and return a data.frame with the output.  "id" represents
>> repeated observations over "time" and I want to scale the "slope"
>> variable.  The "out" variable shows the output I want.  My for..loop
>> does the job but is probably very slow versus other methods.  ddply
>> seems ideal, but despite playing with the baseball examples quite a bit
>> I can't figure out how to get it to work with my sample dataset.  
>> 
>> TIA for any help, Roger
>> 
>> Here is the sample code:
>> 
>> dat <- data.frame(id=rep(letters[1:5],3),
>> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
>> dat
>> 
>> for (i in 1:3) {
>>   mat <- dat[dat$time==i, ]
>>   outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
>>   if (i==1) {
>>       out <- outi
>>   } else {
>>       out <- rbind(out, outi)
>>   }
>> }
>> out
>> 
>> Here is the sample output:
>> 
>>> dat <- data.frame(id=rep(letters[1:5],3),
>> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
>> 
>>> dat
>>  id time slope
>> 1   a    1     1
>> 2   b    1     2
>> 3   c    1     3
>> 4   d    1     4
>> 5   e    1     5
>> 6   a    2     6
>> 7   b    2     7
>> 8   c    2     8
>> 9   d    2     9
>> 10  e    2    10
>> 11  a    3    11
>> 12  b    3    12
>> 13  c    3    13
>> 14  d    3    14
>> 15  e    3    15
>> 
>>> for (i in 1:3) {
>> +     mat <- dat[dat$time==i, ]
>> +     outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
>> +     if (i==1) {
>> +         out  .... [TRUNCATED] 
>> 
>>> out
>>  mat.time mat.id      slope
>> 1         1      a -1.2649111
>> 2         1      b -0.6324555
>> 3         1      c  0.0000000
>> 4         1      d  0.6324555
>> 5         1      e  1.2649111
>> 6         2      a -1.2649111
>> 7         2      b -0.6324555
>> 8         2      c  0.0000000
>> 9         2      d  0.6324555
>> 10        2      e  1.2649111
>> 11        3      a -1.2649111
>> 12        3      b -0.6324555
>> 13        3      c  0.0000000
>> 14        3      d  0.6324555
>> 15        3      e  1.2649111
>>> 
>> ***************************************************************
> 
> 
> Roger, seems like you might want:
> 
> See ?ave
> 
>> cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
>   id time slope      slope
> 1   a    1     1 -1.2649111
> 2   b    1     2 -0.6324555
> 3   c    1     3  0.0000000
> 4   d    1     4  0.6324555
> 5   e    1     5  1.2649111
> 6   a    2     6 -1.2649111
> 7   b    2     7 -0.6324555
> 8   c    2     8  0.0000000
> 9   d    2     9  0.6324555
> 10  e    2    10  1.2649111
> 11  a    3    11 -1.2649111
> 12  b    3    12 -0.6324555
> 13  c    3    13  0.0000000
> 14  d    3    14  0.6324555
> 15  e    3    15  1.2649111


Quick fine tune, as I forgot to remove the original 'slope' column above.

> cbind(dat[, -3], slope = ave(dat$slope, list(dat$time), FUN = scale))
   id time      slope
1   a    1 -1.2649111
2   b    1 -0.6324555
3   c    1  0.0000000
4   d    1  0.6324555
5   e    1  1.2649111
6   a    2 -1.2649111
7   b    2 -0.6324555
8   c    2  0.0000000
9   d    2  0.6324555
10  e    2  1.2649111
11  a    3 -1.2649111
12  b    3 -0.6324555
13  c    3  0.0000000
14  d    3  0.6324555
15  e    3  1.2649111


Marc



More information about the R-help mailing list