[R] lagging over consecutive pairs of rows in dataframe

Evan Cooch evan.cooch at gmail.com
Fri Mar 17 18:57:48 CET 2017


Thanks very much. I suspect 50% of my time in R is spent translating 
from what I know how to do in SAS (25+ years of heavy use), to what is 
equivalent in SAS. So far, I haven't found anything I can do in SAS that 
I can't do in R, with some help. ;-)

Cheers...

On 3/17/2017 1:51 PM, Bert Gunter wrote:
> Evan:
>
> Yes, I stand partially corrected. You have the concept correct, but R
> implements it differently than SAS.
>
> I think what you want for your approach is diff():
>
> evens <-  (seq_len(nrow(mydata)) %% 2) == 0
> newdat <-data.frame(exp=mydata[evens,1 ],reslt= diff(mydata[,2])[evens[-1]])
>
> ... which seems neater to me than what I offered previously.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Mar 17, 2017 at 10:25 AM, Evan Cooch <evan.cooch at gmail.com> wrote:
>>
>> On 3/17/2017 1:19 PM, Bert Gunter wrote:
>>> Evan:
>>>
>>> You misunderstand the concept of a lagged variable.
>>
>> Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly how
>> it works.:
>>
>> data test;
>> input exp rslt;
>> cards;
>> <data in the data frame in OP>
>>      *;
>>
>>
>>      data test2; set test; by exp;
>>      diff=rslt-lag(rslt);
>>        if last.exp;
>>
>>> Ulrik:
>>>
>>> Well, yes, that is certainly a general solution that works. However,
>>> given the *specific* structure described by the OP, an even more
>>> direct (maybe more efficient?) way to do it just uses (logical)
>>> subscripting:
>>>
>>> odds <-  (seq_len(nrow(mydata)) %% 2) == 1
>>> newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2])
>>> names(newdat) <- names(mydata)
>>>
>> Interesting - thanks!
>>
>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
>>> wrote:
>>>> Hi Evan
>>>>
>>>> you can easily do this by applying diff() to each exp group.
>>>>
>>>> Either using dplyr:
>>>> library(dplyr)
>>>> mydata %>%
>>>>     group_by(exp) %>%
>>>>     summarise(difference = diff(rslt))
>>>>
>>>> Or with base R
>>>> aggregate(mydata, by = list(group = mydata$exp), FUN = diff)
>>>>
>>>> HTH
>>>> Ulrik
>>>>
>>>>
>>>> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:
>>>>
>>>>> Suppose I have a dataframe that looks like the following:
>>>>>
>>>>> n=2
>>>>> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
>>>>> c(12,15,7,8,24,28,33,15,22,11))
>>>>> mydata
>>>>>       exp rslt
>>>>> 1    1   12
>>>>> 2    1   15
>>>>> 3    2    7
>>>>> 4    2    8
>>>>> 5    3   24
>>>>> 6    3   28
>>>>> 7    4   33
>>>>> 8    4   15
>>>>> 9    5   22
>>>>> 10   5   11
>>>>>
>>>>> The variable 'exp' (for experiment') occurs in pairs over consecutive
>>>>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
>>>>> the 'control', and the second is a 'treatment'. The rslt column is the
>>>>> result.
>>>>>
>>>>> What I'm trying to do is create a subset of this dataframe that consists
>>>>> of the exp number, and the lagged difference between the 'control' and
>>>>> 'treatment' result.  So, for exp=1, the difference is (15-12)=3. For
>>>>> exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is
>>>>> take mydata (above), and turn it into
>>>>>
>>>>>         exp  diff
>>>>> 1   1      3
>>>>> 2   2      1
>>>>> 3   3      4
>>>>> 4   4      -18
>>>>> 5   5      -11
>>>>>
>>>>> The basic 'trick' I can't figure out is how to create a lagged variable
>>>>> between the second row (record) for a given level of exp, and the first
>>>>> row for that exp.  This is easy to do in SAS (which I'm more familiar
>>>>> with), but I'm struggling with the equivalent in R. The brute force
>>>>> approach  I thought of is to simply split the dataframe into to (one
>>>>> even rows, one odd rows), merge by exp, and then calculate a difference.
>>>>> But this seems to require renaming the rslt column in the two new
>>>>> dataframes so they are different in the merge (say, rslt_cont n the odd
>>>>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
>>>>> a difference between the two.
>>>>>
>>>>> While I suppose this would work, I'm wondering if I'm missing a more
>>>>> elegant 'in place' approach that doesn't require me to split the data
>>>>> frame and do every via a merge.
>>>>>
>>>>> Suggestions/pointers to the obvious welcome. I've tried playing with
>>>>> lag, and some approaches using lag in the zoo package,  but haven't
>>>>> found the magic trick. The problem (meaning, what I can't figure out)
>>>>> seems to be conditioning the lag on the level of exp.
>>>>>
>>>>> Many thanks...
>>>>>
>>>>>
>>>>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
>>>>> c(6,17,26,37,44))
>>>>>
>>>>>
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>           [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>



More information about the R-help mailing list