[R] the first and last observation for each subject

hadley wickham h.wickham at gmail.com
Mon Jan 5 18:10:29 CET 2009


> Another application of that technique can be used to quickly compute
> medians by groups:
>
> gm <- function(x, group){ # medians by group:
> sapply(split(x,group),median)
>   o<-order(group, x)
>   group <- group[o]
>   x <- x[o]
>   changes <- group[-1] != group[-length(group)]
>   first <- which(c(TRUE, changes))
>   last <- which(c(changes, TRUE))
>   lowerMedian <- x[floor((first+last)/2)]
>   upperMedian <- x[ceiling((first+last)/2)]
>   median <- (lowerMedian+upperMedian)/2
>   names(median) <- group[first]
>   median
> }
>
> For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
> (in random order) the times are:
>
>> group<-sample(1:30000, size=100000, replace=TRUE)
>> x<-rnorm(length(group))*10 + group
>> unix.time(z0<-sapply(split(x,group), median))
>   user  system elapsed
>   2.72    0.00    3.20
>> unix.time(z1<-gm(x,group))
>   user  system elapsed
>   0.12    0.00    0.16
>> identical(z1,z0)
> [1] TRUE

I get:

> unix.time(z0<-sapply(split(x,group), median))
   user  system elapsed
  2.733   0.017   2.766
> unix.time(z1<-gm(x,group))
   user  system elapsed
  2.897   0.032   2.946


Hadley


-- 
http://had.co.nz/




More information about the R-help mailing list