[R] Optimize for loop / find last record for each person
David Winsemius
dwinsemius at comcast.net
Fri Feb 27 21:10:34 CET 2009
"...from an SQL database." How? Structure of the result?
You say "ordered by date" but then you don't reference any date
variable? And your code creates an "order" column, but that would not
appear necessary for the stated purpose and you don't output the last
"order" within a "person_id".
See if the tapply strategy below gives you alternate approaches
(which should work for the typical default numbering of data.frame
rows):
DF <-structure(list(Month = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 3L), .Label = c("Aug", "July", "Sept"), class =
"factor"),
Week = 27:39, Estpassage = c(665L, 2232L, 9241L, 28464L,
41049L, 82216L, 230411L, 358541L, 747839L, 459682L, 609567L,
979475L, 837189L), MedFL = c(34L, 35L, 35L, 35L, 35L, 35L,
35L, 35L, 35L, 36L, 36L, 36L, 36L)), .Names = c("Month",
"Week", "Estpassage", "MedFL"), class = "data.frame", row.names = c(NA,
-13L))
> DF
Month Week Estpassage MedFL
1 July 27 665 34
2 July 28 2232 35
3 July 29 9241 35
4 July 30 28464 35
5 Aug 31 41049 35
6 Aug 32 82216 35
7 Aug 33 230411 35
8 Aug 34 358541 35
9 Sept 35 747839 35
10 Sept 36 459682 36
11 Sept 37 609567 36
12 Sept 38 979475 36
13 Sept 39 837189 36
tapply(as.numeric(rownames(DF)), DF$Month, max)
# substitute history for DF, and history$person_id for DF$Month
Aug July Sept
8 4 13
You need the as.numeric around the rownames to prevent the alpha
interpretation of "maximum" from being used. The by() and aggregate()
functions are convenience functions using tapply, but I have more
success with tapply itself.
I rather wonder what use this is to be put to, and whether there might
be more efficient overall approach. It is not that typical to need the
location of the last member of a group within a dataframe.
--
David
On Feb 27, 2009, at 2:02 PM, Andrew Ziem wrote:
> I want to find the last record for each person_id in a data frame
> (from a SQL database) ordered by date. Is there a better way than
> this for loop?
>
> for (i in 2:length(history[,1])) {
> if (history[i, "person_id"] == history[i - 1, "person_id"])
> history[i, "order"] = history[i - 1, "order"] + 1 # same person
> else
> history[i, "order"] = 1 # new person
> }
>
> # ignore all records except the last for each con_id
> history2 <- subset(history, order == 1)
>
>
> Andrew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list