[R] Executing for loop by grouping variable within dataframe

ssobek at gwdg.de ssobek at gwdg.de
Thu Jul 28 19:26:17 CEST 2011


Dear Dennis,

Thank you very much for your quick response! Your code does indeed solve
my problem. I figured I would have to define a function somehow to use
anything like ddply or similar, but couldn't wrap my head around how to
set it up properly.

I only started using R for anything more sophisticated than plain
statistics and plotting a couple of months ago, and I yet have to find a
good resource (book?) for teaching myself the basics of actual
programming, rather than plugging in commands. If anyone has a
recommendation, I would be thankful to hear about it!

It also doesn't help that I'm the only one in my work environment who
insists on using R, while everyone else still sticks with Matlab (but
people seem to get increasingly interested in R now), so if I get stuck
I'm at a complete loss.

To answer your question why temp.t[1] in the second group equalled -2.0
instead of -2.2 in my example- this is simply because it was running
through the loop row by row over the entire dataset, rather than for each
group separately as I intended, so it pulled information from the
preceding data.

Thanks again for your help! I greatly appreciate it, and I hope I can
improve my skill level soon.

Have a nice day,

Stephanie


> Hi:
>
> I don't get exactly the same results as you did in the second group
> (how does temp.t[1] = -2.0 instead of -2.2?) but try this:
>
> locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958",
> "USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001",
> "USC00021001", "USC00021001")
> temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0)
> month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11)
> day= c(27, 28, 29, 30, 31, 1,  2,  3,  4,  5,  6)
> df=data.frame(locality, temp.a, month, day)
>
> f <- function(d) {
>    k <- 0.8
>    if(nrow(d) == 1L) {return(data.frame(d, temp.t = temp.a))} else {
>    tmp <- rep(NA, nrow(d))
>    tmp[1] <- d[1, 'temp.a']
>    for(j in 2:length(tmp))
>       tmp[j] <- tmp[j - 1] + k * (d$temp.a[j] - tmp[j - 1])
>    data.frame(d, temp.t = tmp)  }
>   }
>
> require('plyr')
> ddply(df, 'locality', f)
>       locality temp.a month day    temp.t
> 1  USC00020958   -1.2    12  27 -1.200000
> 2  USC00020958   -1.2    12  28 -1.200000
> 3  USC00020958   -1.2    12  29 -1.200000
> 4  USC00020958   -1.2    12  30 -1.200000
> 5  USC00020958   -1.1    12  31 -1.120000
> 6  USC00021001   -2.2    11   1 -2.200000
> 7  USC00021001   -2.4    11   2 -2.360000
> 8  USC00021001   -2.6    11   3 -2.552000
> 9  USC00021001   -2.7    11   4 -2.670400
> 10 USC00021001   -2.8    11   5 -2.774080
> 11 USC00021001   -3.0    11   6 -2.954816
>
> If you want to round the result, substitute the last line in the function
> with
>     data.frame(d, temp.t = round(tmp, 1))
>
> Related functions are ceiling() and floor() in case they are of interest.
>
> HTH,
> Dennis
>
>
> On Wed, Jul 27, 2011 at 10:38 AM,  <ssobek at gwdg.de> wrote:
>> Dear list,
>>
>> I have a large dataset which is structured as follows:
>>
>> locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958",
>> "USC00020958", "USC00021001","USC00021001", "USC00021001",
>> "USC00021001",
>> "USC00021001", "USC00021001")
>>
>> temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8,
>> -3.0)
>>
>> month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11)
>>
>> day= c(27, 28, 29, 30, 31, 1,  2,  3,  4,  5,  6)
>>
>> df=data.frame(locality,temp.a,month,day)
>>
>>>      locality temp.a month day
>>>1  USC00020958   -1.2    12  27
>>>2  USC00020958   -1.2    12  28
>>>3  USC00020958   -1.2    12  29
>>>4  USC00020958   -1.2    12  30
>>>5  USC00020958   -1.1    12  31
>>>6  USC00021001   -2.2    11   1
>>>7  USC00021001   -2.4    11   2
>>>8  USC00021001   -2.6    11   3
>>>9  USC00021001   -2.7    11   4
>>>10 USC00021001   -2.8    11   5
>>>11 USC00021001   -3.0    11   6
>>
>> I would like to calculate a 5th variable, temp.t, based on temp.a, and
>> temp.t for the preceding time step. I successfully created a for loop as
>> follows:
>>
>> temp.t=list()
>>
>> for(i in 2:nrow(df)){
>> k=0.8
>> temp.t[1]=df$temp.a[1]
>> temp.t[i]=(as.numeric(temp.t[i-1]))+k*(as.numeric(df$temp.a[i])-(as.numeric(temp.t[i-1])))
>> }
>>
>> temp.t <- unlist(temp.t)
>>
>>
>> df["temp.t"] <- round(temp.t,1)
>>
>> df
>>
>>>     locality temp.a month day temp.t
>>>1  USC00020958   -1.2    12  27   -1.2
>>>2  USC00020958   -1.2    12  28   -1.2
>>>3  USC00020958   -1.2    12  29   -1.2
>>>4  USC00020958   -1.2    12  30   -1.2
>>>5  USC00020958   -1.1    12  31   -1.1
>>>6  USC00021001   -2.2    11   1   -2.0
>>>7  USC00021001   -2.4    11   2   -2.3
>>>8  USC00021001   -2.6    11   3   -2.5
>>>9  USC00021001   -2.7    11   4   -2.7
>>>10 USC00021001   -2.8    11   5   -2.8
>>>11 USC00021001   -3.0    11   6   -3.0
>>
>> This worked fine as long as I was dealing with datasets that only
>> contained one locality. However, as you can see above, my current
>> dataset
>> contains more than one locality, and I need to execute my loop for each
>> locality separately. What is the best approach to do this?
>>
>> I have tried repeatedly to put the loop into a command using either ave,
>> by or tapply and to specify locality as the grouping variable, but no
>> matter what I try, nothing works, because I am unable to specify my loop
>> as a function within ave, by, or tapply.
>>
>> I don't know if I am just doing it wrong (likely!) since I have no
>> experience working with loops/functions, or if this is simply not the
>> right approach to  solve my problem. I was also considering using a
>> nested
>> for loop, but failed at setting it up. I would greatly appreciate if
>> someone could point me in the right direction.
>>
>> Thanks a lot,
>>
>> Stephanie
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list