[R] lagging over consecutive pairs of rows in dataframe
Evan Cooch
evan.cooch at gmail.com
Fri Mar 17 18:57:48 CET 2017
Thanks very much. I suspect 50% of my time in R is spent translating
from what I know how to do in SAS (25+ years of heavy use), to what is
equivalent in SAS. So far, I haven't found anything I can do in SAS that
I can't do in R, with some help. ;-)
Cheers...
On 3/17/2017 1:51 PM, Bert Gunter wrote:
> Evan:
>
> Yes, I stand partially corrected. You have the concept correct, but R
> implements it differently than SAS.
>
> I think what you want for your approach is diff():
>
> evens <- (seq_len(nrow(mydata)) %% 2) == 0
> newdat <-data.frame(exp=mydata[evens,1 ],reslt= diff(mydata[,2])[evens[-1]])
>
> ... which seems neater to me than what I offered previously.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Mar 17, 2017 at 10:25 AM, Evan Cooch <evan.cooch at gmail.com> wrote:
>>
>> On 3/17/2017 1:19 PM, Bert Gunter wrote:
>>> Evan:
>>>
>>> You misunderstand the concept of a lagged variable.
>>
>> Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly how
>> it works.:
>>
>> data test;
>> input exp rslt;
>> cards;
>> <data in the data frame in OP>
>> *;
>>
>>
>> data test2; set test; by exp;
>> diff=rslt-lag(rslt);
>> if last.exp;
>>
>>> Ulrik:
>>>
>>> Well, yes, that is certainly a general solution that works. However,
>>> given the *specific* structure described by the OP, an even more
>>> direct (maybe more efficient?) way to do it just uses (logical)
>>> subscripting:
>>>
>>> odds <- (seq_len(nrow(mydata)) %% 2) == 1
>>> newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2])
>>> names(newdat) <- names(mydata)
>>>
>> Interesting - thanks!
>>
>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
>>> wrote:
>>>> Hi Evan
>>>>
>>>> you can easily do this by applying diff() to each exp group.
>>>>
>>>> Either using dplyr:
>>>> library(dplyr)
>>>> mydata %>%
>>>> group_by(exp) %>%
>>>> summarise(difference = diff(rslt))
>>>>
>>>> Or with base R
>>>> aggregate(mydata, by = list(group = mydata$exp), FUN = diff)
>>>>
>>>> HTH
>>>> Ulrik
>>>>
>>>>
>>>> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:
>>>>
>>>>> Suppose I have a dataframe that looks like the following:
>>>>>
>>>>> n=2
>>>>> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
>>>>> c(12,15,7,8,24,28,33,15,22,11))
>>>>> mydata
>>>>> exp rslt
>>>>> 1 1 12
>>>>> 2 1 15
>>>>> 3 2 7
>>>>> 4 2 8
>>>>> 5 3 24
>>>>> 6 3 28
>>>>> 7 4 33
>>>>> 8 4 15
>>>>> 9 5 22
>>>>> 10 5 11
>>>>>
>>>>> The variable 'exp' (for experiment') occurs in pairs over consecutive
>>>>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
>>>>> the 'control', and the second is a 'treatment'. The rslt column is the
>>>>> result.
>>>>>
>>>>> What I'm trying to do is create a subset of this dataframe that consists
>>>>> of the exp number, and the lagged difference between the 'control' and
>>>>> 'treatment' result. So, for exp=1, the difference is (15-12)=3. For
>>>>> exp=2, the difference is (8-7)=1, and so on. What I'm hoping to do is
>>>>> take mydata (above), and turn it into
>>>>>
>>>>> exp diff
>>>>> 1 1 3
>>>>> 2 2 1
>>>>> 3 3 4
>>>>> 4 4 -18
>>>>> 5 5 -11
>>>>>
>>>>> The basic 'trick' I can't figure out is how to create a lagged variable
>>>>> between the second row (record) for a given level of exp, and the first
>>>>> row for that exp. This is easy to do in SAS (which I'm more familiar
>>>>> with), but I'm struggling with the equivalent in R. The brute force
>>>>> approach I thought of is to simply split the dataframe into to (one
>>>>> even rows, one odd rows), merge by exp, and then calculate a difference.
>>>>> But this seems to require renaming the rslt column in the two new
>>>>> dataframes so they are different in the merge (say, rslt_cont n the odd
>>>>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
>>>>> a difference between the two.
>>>>>
>>>>> While I suppose this would work, I'm wondering if I'm missing a more
>>>>> elegant 'in place' approach that doesn't require me to split the data
>>>>> frame and do every via a merge.
>>>>>
>>>>> Suggestions/pointers to the obvious welcome. I've tried playing with
>>>>> lag, and some approaches using lag in the zoo package, but haven't
>>>>> found the magic trick. The problem (meaning, what I can't figure out)
>>>>> seems to be conditioning the lag on the level of exp.
>>>>>
>>>>> Many thanks...
>>>>>
>>>>>
>>>>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
>>>>> c(6,17,26,37,44))
>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
More information about the R-help
mailing list