[R] the first and last observation for each subject
hadley wickham
h.wickham at gmail.com
Mon Jan 5 18:10:29 CET 2009
> Another application of that technique can be used to quickly compute
> medians by groups:
>
> gm <- function(x, group){ # medians by group:
> sapply(split(x,group),median)
> o<-order(group, x)
> group <- group[o]
> x <- x[o]
> changes <- group[-1] != group[-length(group)]
> first <- which(c(TRUE, changes))
> last <- which(c(changes, TRUE))
> lowerMedian <- x[floor((first+last)/2)]
> upperMedian <- x[ceiling((first+last)/2)]
> median <- (lowerMedian+upperMedian)/2
> names(median) <- group[first]
> median
> }
>
> For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
> (in random order) the times are:
>
>> group<-sample(1:30000, size=100000, replace=TRUE)
>> x<-rnorm(length(group))*10 + group
>> unix.time(z0<-sapply(split(x,group), median))
> user system elapsed
> 2.72 0.00 3.20
>> unix.time(z1<-gm(x,group))
> user system elapsed
> 0.12 0.00 0.16
>> identical(z1,z0)
> [1] TRUE
I get:
> unix.time(z0<-sapply(split(x,group), median))
user system elapsed
2.733 0.017 2.766
> unix.time(z1<-gm(x,group))
user system elapsed
2.897 0.032 2.946
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list