[R] How to speed up or avoid the for-loops in this example?

Thu Feb 15 02:24:37 CET 2007

Any advice, tips, clues or pointers to resources on how best to speed up
or, better still, avoid the loops in the following example code much
appreciated. My actual dataset has several tens of thousands of rows and
lots of columns, and these loops take a rather long time to run.
Everything else which I need to do is done using vectors and those parts
all run very quickly indeed. I spent quite a while doing searches on
r-help and re-reading the various manuals, but couldn't find any
existing relevant advice. I am sure the solution is obvious, but it
escapes me.

Tim C

# create an example data frame, multiple events per subject

year <- c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983)
event.of.interest <- c(F,T,T,F,F,F,T,F,T,T,F)
subject <- c(1,1,1,2,2,3,3,3,3,4,4)
df <- data.frame(cbind(subject,year,event.of.interest))

# add a per-subject sequence number

df$subject.seq <- 1
for (i in 2:nrow(df)) {
 if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] <-
df$subject.seq[i-1] + 1
}
df

# add an event sequence number which is zero until the first
# event of interest for that subject happens, and then increments
# thereafter

df$event.seq <- 0
for (i in 1:nrow(df)) {
 if (df$subject.seq[i] == 1 ) {
    current.event.seq <- 0
 }
 if (event.of.interest[i] == 1 | current.event.seq > 0)
current.event.seq <- current.event.seq + 1
 df$event.seq[i] <- current.event.seq
}
df