[R] lagging over consecutive pairs of rows in dataframe

Ulrik Stervbo ulrik.stervbo at gmail.com
Fri Mar 17 17:58:30 CET 2017


Hi Evan

you can easily do this by applying diff() to each exp group.

Either using dplyr:
library(dplyr)
mydata %>%
  group_by(exp) %>%
  summarise(difference = diff(rslt))

Or with base R
aggregate(mydata, by = list(group = mydata$exp), FUN = diff)

HTH
Ulrik


On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote:

> Suppose I have a dataframe that looks like the following:
>
> n=2
> mydata <- data.frame(exp = rep(1:5,each=n), rslt =
> c(12,15,7,8,24,28,33,15,22,11))
> mydata
>     exp rslt
> 1    1   12
> 2    1   15
> 3    2    7
> 4    2    8
> 5    3   24
> 6    3   28
> 7    4   33
> 8    4   15
> 9    5   22
> 10   5   11
>
> The variable 'exp' (for experiment') occurs in pairs over consecutive
> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is
> the 'control', and the second is a 'treatment'. The rslt column is the
> result.
>
> What I'm trying to do is create a subset of this dataframe that consists
> of the exp number, and the lagged difference between the 'control' and
> 'treatment' result.  So, for exp=1, the difference is (15-12)=3. For
> exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is
> take mydata (above), and turn it into
>
>       exp  diff
> 1   1      3
> 2   2      1
> 3   3      4
> 4   4      -18
> 5   5      -11
>
> The basic 'trick' I can't figure out is how to create a lagged variable
> between the second row (record) for a given level of exp, and the first
> row for that exp.  This is easy to do in SAS (which I'm more familiar
> with), but I'm struggling with the equivalent in R. The brute force
> approach  I thought of is to simply split the dataframe into to (one
> even rows, one odd rows), merge by exp, and then calculate a difference.
> But this seems to require renaming the rslt column in the two new
> dataframes so they are different in the merge (say, rslt_cont n the odd
> dataframe, and rslt_trt in the even dataframe), allowing me to calculate
> a difference between the two.
>
> While I suppose this would work, I'm wondering if I'm missing a more
> elegant 'in place' approach that doesn't require me to split the data
> frame and do every via a merge.
>
> Suggestions/pointers to the obvious welcome. I've tried playing with
> lag, and some approaches using lag in the zoo package,  but haven't
> found the magic trick. The problem (meaning, what I can't figure out)
> seems to be conditioning the lag on the level of exp.
>
> Many thanks...
>
>
> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y =
> c(6,17,26,37,44))
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list