[R] Optimize for loop / find last record for each person

Andrew Ziem ahz001 at gmail.com
Fri Feb 27 23:47:24 CET 2009


On Fri, Feb 27, 2009 at 2:10 PM, William Dunlap <wdunlap at tibco.com> wrote:
> Andrew, it makes it easier to help if you supply a typical
> input and expected output along with your code.  I tried
> your code with the following input:

I'll be careful to avoid these mistakes.  Also, I should not have used
a reserved word for the variable history, and I should have mentioned
the data is sorted with the most recent dates first. Talk about a bad
day! :)

Originally I omitted this code before the for loop:

history["order"] <- NA
history[1,"order"] = 1

Here's a sample data set:
history_ <- data.frame(person_id=list(c(1,2,2)),date_=list(c("2009-01-01","2009-02-03","2009-02-02")),
x=list(c(0.01,0.05,0.06)) )
colnames(history_) <- c("person_id", "date_","x")
history_

Jorge's suggestion[1] works for me, and it seems much faster.  I
simply adapted it by replacing Jorge's variable x with a sequential
identifier already in the database.
[1] https://stat.ethz.ch/pipermail/r-help/2009-February/189981.html

> The following function, f2, does what I think you are saying
> you want.  It sorts the data by person_id, breaking ties with
> date, and then selects the rows where the person_id entry does

My data is already sorted by the SQL database like this
 ORDER BY person_id, date_ DESC

Thanks everyone for responding and expanding my knowledge of R!


Best regards,
Andrew




More information about the R-help mailing list