[R] lagging over consecutive pairs of rows in dataframe

Bert Gunter bgunter.4567 at gmail.com
Fri Mar 17 18:51:32 CET 2017


Evan:

Yes, I stand partially corrected. You have the concept correct, but R
implements it differently than SAS.

I think what you want for your approach is diff():

evens <-  (seq_len(nrow(mydata)) %% 2) == 0
newdat <-data.frame(exp=mydata[evens,1 ],reslt= diff(mydata[,2])[evens[-1]])

... which seems neater to me than what I offered previously.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 17, 2017 at 10:25 AM, Evan Cooch <evan.cooch at gmail.com> wrote:
>
>
> On 3/17/2017 1:19 PM, Bert Gunter wrote:
>>
>> Evan:
>>
>> You misunderstand the concept of a lagged variable.
>
>
> Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly how
> it works.:
>
> data test;
> input exp rslt;
> cards;
> <data in the data frame in OP>
>     *;
>
>
>     data test2; set test; by exp;
>     diff=rslt-lag(rslt);
>       if last.exp;
>
>>
>> Ulrik:
>>
>> Well, yes, that is certainly a general solution that works. However,
>> given the *specific* structure described by the OP, an even more
>> direct (maybe more efficient?) way to do it just uses (logical)
>> subscripting:
>>
>> odds <-  (seq_len(nrow(mydata)) %% 2) == 1
>> newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2])
>> names(newdat) <- names(mydata)
>>
>
> Interesting - thanks!
>
>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
>> wrote:
>>>
>>> Hi Evan
>>>
>>> you can easily do this by applying diff() to each exp group.
>>>
>>> Either using dplyr:
>>> library(dplyr)
>>> mydata %>%
>>>    group_by(exp) %>%
>>>    summarise(difference = diff(rslt))
>>>
>>> Or with base R
>>> aggregate(mydata, by = list(group = mydata$exp), FUN = diff)
>>>
>>> HTH
>>> Ulrik
>>>
>>>
>>> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:
>>>
>>>> Suppose I have a dataframe that looks like the following:
>>>>
>>>> n=2
>>>> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
>>>> c(12,15,7,8,24,28,33,15,22,11))
>>>> mydata
>>>>      exp rslt
>>>> 1    1   12
>>>> 2    1   15
>>>> 3    2    7
>>>> 4    2    8
>>>> 5    3   24
>>>> 6    3   28
>>>> 7    4   33
>>>> 8    4   15
>>>> 9    5   22
>>>> 10   5   11
>>>>
>>>> The variable 'exp' (for experiment') occurs in pairs over consecutive
>>>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
>>>> the 'control', and the second is a 'treatment'. The rslt column is the
>>>> result.
>>>>
>>>> What I'm trying to do is create a subset of this dataframe that consists
>>>> of the exp number, and the lagged difference between the 'control' and
>>>> 'treatment' result.  So, for exp=1, the difference is (15-12)=3. For
>>>> exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is
>>>> take mydata (above), and turn it into
>>>>
>>>>        exp  diff
>>>> 1   1      3
>>>> 2   2      1
>>>> 3   3      4
>>>> 4   4      -18
>>>> 5   5      -11
>>>>
>>>> The basic 'trick' I can't figure out is how to create a lagged variable
>>>> between the second row (record) for a given level of exp, and the first
>>>> row for that exp.  This is easy to do in SAS (which I'm more familiar
>>>> with), but I'm struggling with the equivalent in R. The brute force
>>>> approach  I thought of is to simply split the dataframe into to (one
>>>> even rows, one odd rows), merge by exp, and then calculate a difference.
>>>> But this seems to require renaming the rslt column in the two new
>>>> dataframes so they are different in the merge (say, rslt_cont n the odd
>>>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
>>>> a difference between the two.
>>>>
>>>> While I suppose this would work, I'm wondering if I'm missing a more
>>>> elegant 'in place' approach that doesn't require me to split the data
>>>> frame and do every via a merge.
>>>>
>>>> Suggestions/pointers to the obvious welcome. I've tried playing with
>>>> lag, and some approaches using lag in the zoo package,  but haven't
>>>> found the magic trick. The problem (meaning, what I can't figure out)
>>>> seems to be conditioning the lag on the level of exp.
>>>>
>>>> Many thanks...
>>>>
>>>>
>>>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
>>>> c(6,17,26,37,44))
>>>>
>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list