[R] Optimize for loop / find last record for each person

David Winsemius dwinsemius at comcast.net
Fri Feb 27 21:10:34 CET 2009


"...from an SQL database."    How? Structure of the result?

You say "ordered by date" but then you don't reference any date  
variable? And your code creates an "order" column, but that would not  
appear necessary for the stated purpose and you don't output the last  
"order" within a "person_id".

See if the tapply  strategy below gives you alternate approaches  
(which should work for the typical default numbering of data.frame  
rows):

DF <-structure(list(Month = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 3L), .Label = c("Aug", "July", "Sept"), class =  
"factor"),
     Week = 27:39, Estpassage = c(665L, 2232L, 9241L, 28464L,
     41049L, 82216L, 230411L, 358541L, 747839L, 459682L, 609567L,
     979475L, 837189L), MedFL = c(34L, 35L, 35L, 35L, 35L, 35L,
     35L, 35L, 35L, 36L, 36L, 36L, 36L)), .Names = c("Month",
"Week", "Estpassage", "MedFL"), class = "data.frame", row.names = c(NA,
-13L))

 > DF
    Month Week Estpassage MedFL
1   July   27        665    34
2   July   28       2232    35
3   July   29       9241    35
4   July   30      28464    35
5    Aug   31      41049    35
6    Aug   32      82216    35
7    Aug   33     230411    35
8    Aug   34     358541    35
9   Sept   35     747839    35
10  Sept   36     459682    36
11  Sept   37     609567    36
12  Sept   38     979475    36
13  Sept   39     837189    36

tapply(as.numeric(rownames(DF)), DF$Month, max)
# substitute history for DF,  and history$person_id  for DF$Month

  Aug July Sept
    8    4   13

You need the as.numeric around the rownames to prevent the alpha  
interpretation of "maximum" from being used. The by() and aggregate()  
functions are convenience functions using tapply, but I have more  
success with tapply itself.

I rather wonder what use this is to be put to, and whether there might  
be more efficient overall approach. It is not that typical to need the  
location of the last member of a group within a dataframe.

-- 
David


On Feb 27, 2009, at 2:02 PM, Andrew Ziem wrote:

> I want to find the last record for each person_id in a data frame
> (from a SQL database) ordered by date.  Is there a better way than
> this for loop?
>
> for (i in 2:length(history[,1])) {
>    if (history[i, "person_id"] == history[i - 1, "person_id"])
>      history[i, "order"] = history[i - 1, "order"] + 1 # same person
>    else
>      history[i, "order"] = 1 # new person
> }
>
> # ignore all records except the last for each con_id
> history2 <- subset(history, order == 1)
>
>
> Andrew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list