[R] Executing for loop by grouping variable within dataframe

Dennis Murphy djmuser at gmail.com
Thu Jul 28 01:44:02 CEST 2011


Hi:

I don't get exactly the same results as you did in the second group
(how does temp.t[1] = -2.0 instead of -2.2?) but try this:

locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958",
"USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001",
"USC00021001", "USC00021001")
temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0)
month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11)
day= c(27, 28, 29, 30, 31, 1,  2,  3,  4,  5,  6)
df=data.frame(locality, temp.a, month, day)

f <- function(d) {
   k <- 0.8
   if(nrow(d) == 1L) {return(data.frame(d, temp.t = temp.a))} else {
   tmp <- rep(NA, nrow(d))
   tmp[1] <- d[1, 'temp.a']
   for(j in 2:length(tmp))
      tmp[j] <- tmp[j - 1] + k * (d$temp.a[j] - tmp[j - 1])
   data.frame(d, temp.t = tmp)  }
  }

require('plyr')
ddply(df, 'locality', f)
      locality temp.a month day    temp.t
1  USC00020958   -1.2    12  27 -1.200000
2  USC00020958   -1.2    12  28 -1.200000
3  USC00020958   -1.2    12  29 -1.200000
4  USC00020958   -1.2    12  30 -1.200000
5  USC00020958   -1.1    12  31 -1.120000
6  USC00021001   -2.2    11   1 -2.200000
7  USC00021001   -2.4    11   2 -2.360000
8  USC00021001   -2.6    11   3 -2.552000
9  USC00021001   -2.7    11   4 -2.670400
10 USC00021001   -2.8    11   5 -2.774080
11 USC00021001   -3.0    11   6 -2.954816

If you want to round the result, substitute the last line in the function with
    data.frame(d, temp.t = round(tmp, 1))

Related functions are ceiling() and floor() in case they are of interest.

HTH,
Dennis


On Wed, Jul 27, 2011 at 10:38 AM,  <ssobek at gwdg.de> wrote:
> Dear list,
>
> I have a large dataset which is structured as follows:
>
> locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958",
> "USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001",
> "USC00021001", "USC00021001")
>
> temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0)
>
> month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11)
>
> day= c(27, 28, 29, 30, 31, 1,  2,  3,  4,  5,  6)
>
> df=data.frame(locality,temp.a,month,day)
>
>>      locality temp.a month day
>>1  USC00020958   -1.2    12  27
>>2  USC00020958   -1.2    12  28
>>3  USC00020958   -1.2    12  29
>>4  USC00020958   -1.2    12  30
>>5  USC00020958   -1.1    12  31
>>6  USC00021001   -2.2    11   1
>>7  USC00021001   -2.4    11   2
>>8  USC00021001   -2.6    11   3
>>9  USC00021001   -2.7    11   4
>>10 USC00021001   -2.8    11   5
>>11 USC00021001   -3.0    11   6
>
> I would like to calculate a 5th variable, temp.t, based on temp.a, and
> temp.t for the preceding time step. I successfully created a for loop as
> follows:
>
> temp.t=list()
>
> for(i in 2:nrow(df)){
> k=0.8
> temp.t[1]=df$temp.a[1]
> temp.t[i]=(as.numeric(temp.t[i-1]))+k*(as.numeric(df$temp.a[i])-(as.numeric(temp.t[i-1])))
> }
>
> temp.t <- unlist(temp.t)
>
>
> df["temp.t"] <- round(temp.t,1)
>
> df
>
>>     locality temp.a month day temp.t
>>1  USC00020958   -1.2    12  27   -1.2
>>2  USC00020958   -1.2    12  28   -1.2
>>3  USC00020958   -1.2    12  29   -1.2
>>4  USC00020958   -1.2    12  30   -1.2
>>5  USC00020958   -1.1    12  31   -1.1
>>6  USC00021001   -2.2    11   1   -2.0
>>7  USC00021001   -2.4    11   2   -2.3
>>8  USC00021001   -2.6    11   3   -2.5
>>9  USC00021001   -2.7    11   4   -2.7
>>10 USC00021001   -2.8    11   5   -2.8
>>11 USC00021001   -3.0    11   6   -3.0
>
> This worked fine as long as I was dealing with datasets that only
> contained one locality. However, as you can see above, my current dataset
> contains more than one locality, and I need to execute my loop for each
> locality separately. What is the best approach to do this?
>
> I have tried repeatedly to put the loop into a command using either ave,
> by or tapply and to specify locality as the grouping variable, but no
> matter what I try, nothing works, because I am unable to specify my loop
> as a function within ave, by, or tapply.
>
> I don't know if I am just doing it wrong (likely!) since I have no
> experience working with loops/functions, or if this is simply not the
> right approach to  solve my problem. I was also considering using a nested
> for loop, but failed at setting it up. I would greatly appreciate if
> someone could point me in the right direction.
>
> Thanks a lot,
>
> Stephanie
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list