[R] Extracting data from dataframe with tied rows

William Dunlap wdunlap at tibco.com
Fri Aug 24 23:01:46 CEST 2012


Another strategy is to sort by month, id, and, in reverse order,
distance and select the rows that start each month/id run.  This
can be much faster than the other ways when there are lots of
month/id combinations.

f1 <- function (DATA) 
{
    stopifnot(is.data.frame(DATA),
                      all(c("distance", "id", "month") %in% names(DATA)))
    DATA <- DATA[order(DATA$month, DATA$id, -DATA$distance), ]
    ldiff <- function(x) c(TRUE, x[-1] != x[-length(x)])
    DATA[ldiff(DATA$month) | ldiff(DATA$id), ]
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Peter Ehlers [mailto:ehlers at ucalgary.ca]
> Sent: Friday, August 24, 2012 10:51 AM
> To: William Dunlap
> Cc: Peter Alspach; rjb; r-help at r-project.org
> Subject: Re: [R] Extracting data from dataframe with tied rows
> 
> Here's another pretty straightforward solution, using the plyr pkg:
> 
>   DF <- data.frame(id, month, distance, bearing)
>     # variables as defined in the OP
> 
>   require(plyr)
>   DF1<-ddply(DF, .(id,month), summarize,
>         maxdist = max(distance),
>         maxbearing = bearing[which.max(distance)])
> 
> Peter Ehlers
> 
> On 2012-08-24 09:54, William Dunlap wrote:
> > Or use ave() to compute the within-group ranks (reversed, so max has rank 1) and
> select
> > the elements whose ranks are 1:
> > f2 <- function (DATA)
> > {
> >      stopifnot(is.data.frame(DATA), all(c("distance", "id", "month") %in%
> >          names(DATA)))
> >      revRanks <- ave(DATA[["distance"]], DATA[["id"]], DATA[["month"]],
> >          FUN = function(x) rank(-x, ties = "first"))
> >      DATA[revRanks == 1, ]
> > }
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of Peter Alspach
> >> Sent: Thursday, August 23, 2012 4:37 PM
> >> To: rjb; r-help at r-project.org
> >> Subject: Re: [R] Extracting data from dataframe with tied rows
> >>
> >> Tena koe John
> >>
> >> One way:
> >>
> >> johnData <- data.frame(id=rep(LETTERS[1:5],20), distance=rnorm(1:100, mean = 100),
> >> bearing=sample(1:360,100,replace=T), month=sample(1:12,100,replace=T))
> >> johnAgg <- aggregate(johnData[,'distance'], johnData[,c('id','month')], max)
> >> names(johnAgg)[3] <- 'distance'
> >> merge(johnAgg, johnData)
> >>
> >> HTH ....
> >>
> >> Peter Alspach
> >>
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of rjb
> >> Sent: Friday, 24 August 2012 9:19 a.m.
> >> To: r-help at r-project.org
> >> Subject: [R] Extracting data from dataframe with tied rows
> >>
> >> Hi R help,
> >>
> >> I'm a fairly experienced R user but this manipulation has me stumped, please
> >> help:
> >>
> >> DATA
> >> id<-rep(LETTERS[1:5],20)
> >> distance<-rnorm(1:100, mean = 100)
> >> bearing<-sample(1:360,100,replace=T)
> >> month<-sample(1:12,100,replace=T)
> >>
> >> I have a dataset with records of individuals (id) , each with a distance
> >> (distance) & direction (bearing) recorded for each month (month).
> >> I want to find the largest distance per individual per month, which is easy
> >> with /tapply/ or /melt/cast (reshape)/,
> >> head(DATA_m<-melt(DATA,id=c("id","month")))
> >> cast(DATA_m,id+month~.,max)
> >> OR
> >> na.omit(melt(tapply(distance,list(id,month),max)))
> >>
> >> *BUT THE CATCH IS* ,
> >> I also want the the *corresponding*  bearing for that maximum distance per
> >> month. I've tried the steps above plus using which.max() and loops, but
> >> can't solve the problem. The real dataset is about 6000 rows.
> >>
> >> I'm guessing the answer is in finding the row number from the original DATA
> >> but I can't figure how to do that with tapply or melt.
> >>
> >> Any suggestions would be greatly appreciated.
> >>
> >> John Burnside
> 




More information about the R-help mailing list