[R] Optimize for loop / find last record for each person
William Dunlap
wdunlap at tibco.com
Fri Feb 27 22:10:20 CET 2009
Andrew, it makes it easier to help if you supply a typical
input and expected output along with your code. I tried
your code with the following input:
> history
person_id date
1 Mary 1
2 Mary 2
3 Sue 3
4 Alex 4
5 Joe 5
6 Alex 6
7 Alex 7
8 Sue 8
9 Sue 9
10 Joe 10
made with the function
f<-function(n){
cached.rs <- .Random.seed
on.exit(.Random.seed<<-cached.rs)
set.seed(1)
data.frame(person_id=sample(c("Joe","Mary","Sue","Alex"),
size=n,replace=TRUE), date=seq_len(n))
}
and it failed because there was no column called 'order'.
The following function, f2, does what I think you are saying
you want. It sorts the data by person_id, breaking ties with
date, and then selects the rows where the person_id entry does
not match the person_id entry in the next row. It then sorts
the result by date. (I don't know if the last sort it needed
in your application.) It should be pretty quick for long
datasets with lots of distinct person_id values.
f2 <-function (history)
{
# assume history has, at least, columns called "person_id" and "date"
# Return rows containing the last entry (by date) for each person.
last <- function(x) c(x[-1]!=x[-length(x)], TRUE)
history <- history[with(history, order(person_id,date)),,drop=FALSE]
history <- history[last(history[,"person_id"]),,drop=FALSE]
history[order(history$date),,drop=FALSE]
}
> f2(history)
person_id date
2 Mary 2
7 Alex 7
9 Sue 9
10 Joe 10
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
---------------------------------------------------------------
[R] Optimize for loop / find last record for each person
Andrew Ziem ahz001 at gmail.com
Fri Feb 27 20:02:31 CET 2009
I want to find the last record for each person_id in a data frame
(from a SQL database) ordered by date. Is there a better way than
this for loop?
for (i in 2:length(history[,1])) {
if (history[i, "person_id"] == history[i - 1, "person_id"])
history[i, "order"] = history[i - 1, "order"] + 1 # same person
else
history[i, "order"] = 1 # new person
}
# ignore all records except the last for each con_id
history2 <- subset(history, order == 1)
Andrew
More information about the R-help
mailing list