[R] How to speed up or avoid the for-loops in this example?
Tim Churches
tchur at optushome.com.au
Thu Feb 15 04:29:27 CET 2007
Marc Schwartz wrote:
> OK, here is one possible solution, though perhaps with a bit more time,
> there may be more optimal approaches.
>
> Using your example data above, but first noting that you do not want to
> use:
>
> df <- data.frame(cbind(subject,year,event.of.interest))
>
> Using cbind() first, creates a matrix and causes all columns to be
> coerced to a common data type, obviating the benefit of data frames to
> be able to handle multiple data types.
Yes, quite right, the cbind() was unnecessary. I'm not making my real
data frame that way, however.
> So, now on to the solution:
>
> # First, order the data frame by increasing order of
> # subject number and decreasing order for event.of.interest
> # This ensures that these columns are properly sorted
> # to facilitate the subsequent code.
>
> df <- df[order(df$subject, -df$event.of.interest), ]
>
>
> So, 'df' will look like:
>
>> df
> subject year event.of.interest
> 2 1 1982 TRUE
> 3 1 1996 TRUE
> 1 1 1980 FALSE
> 4 2 1985 FALSE
> 5 2 1987 FALSE
> 7 3 1991 TRUE
> 9 3 1999 TRUE
> 6 3 1990 FALSE
> 8 3 1992 FALSE
> 10 4 1972 TRUE
> 11 4 1983 FALSE
>
>
> # Now use the combinations of sapply(), rle(), seq() and unlist() to
> # generate per subject sequences. Note that rle() returns:
> #
> # > rle(df$subject)
> # Run Length Encoding
> # lengths: int [1:4] 3 2 4 2
> # values : num [1:4] 1 2 3 4
> #
> # See ?rle, ?seq, ?sapply and ?unlist
>
> df$subject.seq <- unlist(sapply(rle(df$subject)$lengths,
> function(x) seq(x)))
>
>
> So, 'df' now looks like:
>
>> df
> subject year event.of.interest subject.seq
> 2 1 1982 TRUE 1
> 3 1 1996 TRUE 2
> 1 1 1980 FALSE 3
> 4 2 1985 FALSE 1
> 5 2 1987 FALSE 2
> 7 3 1991 TRUE 1
> 9 3 1999 TRUE 2
> 6 3 1990 FALSE 3
> 8 3 1992 FALSE 4
> 10 4 1972 TRUE 1
> 11 4 1983 FALSE 2
>
>
> # Now set event.seq to all 0's
>
> df$event.seq <- 0
>
>
> So, 'df' now looks like:
>
>> df
> subject year event.of.interest subject.seq event.seq
> 2 1 1982 TRUE 1 0
> 3 1 1996 TRUE 2 0
> 1 1 1980 FALSE 3 0
> 4 2 1985 FALSE 1 0
> 5 2 1987 FALSE 2 0
> 7 3 1991 TRUE 1 0
> 9 3 1999 TRUE 2 0
> 6 3 1990 FALSE 3 0
> 8 3 1992 FALSE 4 0
> 10 4 1972 TRUE 1 0
> 11 4 1983 FALSE 2 0
>
>
> # Get the unique subject id's
> # See ?unique
>
> subj.id <- unique(df$subject)
>
>
> # Now get the indices for each subject where event.of.interest
> # is TRUE. See ?which
>
> events <- sapply(subj.id,
> function(x) which(df$subject == x & df$event.of.interest))
>
>
> So, 'events' looks like:
>
>> events
> [[1]]
> [1] 1 2
>
> [[2]]
> integer(0)
>
> [[3]]
> [1] 6 7
>
> [[4]]
> [1] 10
>
>
> # Now use sapply() on the above list to create
> # individual sequences per list element:
>
> seq <- sapply(events, function(x) seq(along = x))
>
>
> So 'seq' looks like:
>
>> seq
> [[1]]
> [1] 1 2
>
> [[2]]
> integer(0)
>
> [[3]]
> [1] 1 2
>
> [[4]]
> [1] 1
>
>
> # So, for the final step, assign the event sequence values in 'seq' to
> # the row indices in 'events':
>
> df$event.seq[unlist(events)] <- unlist(seq)
>
>
> So, 'df' now looks like this:
>
>> df
> subject year event.of.interest subject.seq event.seq
> 2 1 1982 TRUE 1 1
> 3 1 1996 TRUE 2 2
> 1 1 1980 FALSE 3 0
> 4 2 1985 FALSE 1 0
> 5 2 1987 FALSE 2 0
> 7 3 1991 TRUE 1 1
> 9 3 1999 TRUE 2 2
> 6 3 1990 FALSE 3 0
> 8 3 1992 FALSE 4 0
> 10 4 1972 TRUE 1 1
> 11 4 1983 FALSE 2 0
>
>
> HTH,
>
> Marc SChwartz
Wow, that's very trick, or tricky. It works but it is a bit slower and
more complex than the Holtzman/Nielsen approach. But some interesting
ides there which I shall bear in mind.
Tim C
More information about the R-help
mailing list