[R] lagging over consecutive pairs of rows in dataframe
Evan Cooch
evan.cooch at gmail.com
Fri Mar 17 18:25:08 CET 2017
On 3/17/2017 1:19 PM, Bert Gunter wrote:
> Evan:
>
> You misunderstand the concept of a lagged variable.
Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly
how it works.:
data test;
input exp rslt;
cards;
<data in the data frame in OP>
*;
data test2; set test; by exp;
diff=rslt-lag(rslt);
if last.exp;
>
> Ulrik:
>
> Well, yes, that is certainly a general solution that works. However,
> given the *specific* structure described by the OP, an even more
> direct (maybe more efficient?) way to do it just uses (logical)
> subscripting:
>
> odds <- (seq_len(nrow(mydata)) %% 2) == 1
> newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2])
> names(newdat) <- names(mydata)
>
Interesting - thanks!
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
>> Hi Evan
>>
>> you can easily do this by applying diff() to each exp group.
>>
>> Either using dplyr:
>> library(dplyr)
>> mydata %>%
>> group_by(exp) %>%
>> summarise(difference = diff(rslt))
>>
>> Or with base R
>> aggregate(mydata, by = list(group = mydata$exp), FUN = diff)
>>
>> HTH
>> Ulrik
>>
>>
>> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:
>>
>>> Suppose I have a dataframe that looks like the following:
>>>
>>> n=2
>>> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
>>> c(12,15,7,8,24,28,33,15,22,11))
>>> mydata
>>> exp rslt
>>> 1 1 12
>>> 2 1 15
>>> 3 2 7
>>> 4 2 8
>>> 5 3 24
>>> 6 3 28
>>> 7 4 33
>>> 8 4 15
>>> 9 5 22
>>> 10 5 11
>>>
>>> The variable 'exp' (for experiment') occurs in pairs over consecutive
>>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
>>> the 'control', and the second is a 'treatment'. The rslt column is the
>>> result.
>>>
>>> What I'm trying to do is create a subset of this dataframe that consists
>>> of the exp number, and the lagged difference between the 'control' and
>>> 'treatment' result. So, for exp=1, the difference is (15-12)=3. For
>>> exp=2, the difference is (8-7)=1, and so on. What I'm hoping to do is
>>> take mydata (above), and turn it into
>>>
>>> exp diff
>>> 1 1 3
>>> 2 2 1
>>> 3 3 4
>>> 4 4 -18
>>> 5 5 -11
>>>
>>> The basic 'trick' I can't figure out is how to create a lagged variable
>>> between the second row (record) for a given level of exp, and the first
>>> row for that exp. This is easy to do in SAS (which I'm more familiar
>>> with), but I'm struggling with the equivalent in R. The brute force
>>> approach I thought of is to simply split the dataframe into to (one
>>> even rows, one odd rows), merge by exp, and then calculate a difference.
>>> But this seems to require renaming the rslt column in the two new
>>> dataframes so they are different in the merge (say, rslt_cont n the odd
>>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
>>> a difference between the two.
>>>
>>> While I suppose this would work, I'm wondering if I'm missing a more
>>> elegant 'in place' approach that doesn't require me to split the data
>>> frame and do every via a merge.
>>>
>>> Suggestions/pointers to the obvious welcome. I've tried playing with
>>> lag, and some approaches using lag in the zoo package, but haven't
>>> found the magic trick. The problem (meaning, what I can't figure out)
>>> seems to be conditioning the lag on the level of exp.
>>>
>>> Many thanks...
>>>
>>>
>>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
>>> c(6,17,26,37,44))
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list