[R] lags of a variable, with a factor

Michael Friendly friendly at yorku.ca
Fri Aug 23 20:16:58 CEST 2013


For sequential analysis of sequences of events, I want to calculate a 
series of lagged
versions of a (numeric or character) variable.  The simple function 
below does this,
but I can't see how to generalize this to the case where there is also a 
factor variable
and I want to calculate lags separately for each level of the factor 
(by).  Can anyone help?

# produce k lagged versions of a numeric or character variable
lags <- function(x, k=1, prefix='lag', by) {
   if(missing(by)) {
   n <- length(x)
   res <- data.frame(lag0=x)
   for (i in 1:k) {
     res <- cbind(res, c(rep(NA, i), x[1:(n-i)]))
   }
   colnames(res) <- paste0(prefix, 0:k)
   return(res)
   }
   else {
     stop('by not yet implemented')
     }
}

# tests
 > events <- sample(letters[1:4], 10, replace=TRUE)
 > lags(events)
    lag0 lag1
1     c <NA>
2     a    c
3     b    a
4     d    b
5     d    d
6     c    d
7     d    c
8     c    d
9     c    c
10    d    c
 > lags(events, 3)
    lag0 lag1 lag2 lag3
1     c <NA> <NA> <NA>
2     a    c <NA> <NA>
3     b    a    c <NA>
4     d    b    a    c
5     d    d    b    a
6     c    d    d    b
7     d    c    d    d
8     c    d    c    d
9     c    c    d    c
10    d    c    c    d
 >

# similar, with by=sub variable

 > events2 <- data.frame(sub=rep(1:2, each=5),
+                       event=sample(letters[1:4], 10, replace=TRUE),
+                       stringsAsFactors=FALSE)
 > events2
    sub event
1    1     b
2    1     d
3    1     d
4    1     c
5    1     b
6    2     b
7    2     b
8    2     b
9    2     d
10   2     a

 > # do it separately for each sub ...
 > (lg <- lapply(split(events2$event, events2$sub), lags, 2))
$`1`
   lag0 lag1 lag2
1    b <NA> <NA>
2    d    b <NA>
3    d    d    b
4    c    d    d
5    b    c    d

$`2`
   lag0 lag1 lag2
1    b <NA> <NA>
2    b    b <NA>
3    b    b    b
4    d    b    b
5    a    d    b

This gives sort of what I want, but I need to have the 'sub' variable 
explicit in the result

 > do.call(rbind, lg)
     lag0 lag1 lag2
1.1    b <NA> <NA>
1.2    d    b <NA>
1.3    d    d    b
1.4    c    d    d
1.5    b    c    d
2.1    b <NA> <NA>
2.2    b    b <NA>
2.3    b    b    b
2.4    d    b    b
2.5    a    d    b
 >

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list