[R] Select rows based on matching conditions and logical operators

William Dunlap wdunlap at tibco.com
Wed Jul 25 23:13:50 CEST 2012


Rui,
  Your solution works, but it can be faster for large data.frames if you compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows  instead of splitting the input data.frame
into a list of data.frames, extracting the desired row from each component,
and then calling rbind to put the rows together again.  E.g., compare your
approach, which I've put into the function f1
  f1 <- function (dataFrame)  {
      retval <- with(dataFrame, sapply(split(dataFrame, list(PTID, 
          Year)), function(x) if (nrow(x)) 
          x[which.max(x$Count), ]))
      retval <- do.call(rbind, retval)
      rownames(retval) <- 1:nrow(retval)
      retval
  }
with one that computes a logical subscripting vector (by splitting just the
Counts vector, not the whole data.frame)
  f2 <- function (dataFrame)  {
      keep <- as.logical(ave(dataFrame$Count, droplevels(interaction(dataFrame$PTID, 
          dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) == 
          which.max(x)))
      dataFrame[keep, ]
  }

The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
> f1(df1)
  PGID  PTID Year Visit Count
1 6755 53122 2008     3     1
2 6755 53121 2009     1     0
3 6755 53122 2009     3     2
> f2(df1)
  PGID  PTID Year Visit Count	
1 6755 53121 2009     1     0
6 6755 53122 2008     3     1
9 6755 53122 2009     3     2
When there are a lot of output rows the f2 can be quite a bit faster.

(I put the call to droplevels(interaction(...)) into the call to ave because ave
can waste a lot of time calling FUN for nonexistent interaction levels.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Rui Barradas
> Sent: Wednesday, July 25, 2012 10:24 AM
> To: kborgmann
> Cc: r-help
> Subject: Re: [R] Select rows based on matching conditions and logical operators
> 
> Hello,
> 
> Apart from the output order this does it.
> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
> density.)
> 
> 
> df1 <- read.table(text="
> PGID PTID Year Visit  Count
> 6755 53121 2009 1 0
> 6755 53121 2009 2 0
> 6755 53121 2009 3 0
> 6755 53122 2008 1 0
> 6755 53122 2008 2 0
> 6755 53122 2008 3 1
> 6755 53122 2009 1 0
> 6755 53122 2009 2 1
> 6755 53122 2009 3 2", header=TRUE)
> 
> 
> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
>      function(x) if (nrow(x)) x[which.max(x$Count), ]))
> df2 <- do.call(rbind, df2)
> rownames(df2) <- 1:nrow(df2)
> df2
> 
> which.max(9, not which().
> 
> Hope this helps,
> 
> Rui Barradas
> Em 25-07-2012 18:10, kborgmann escreveu:
> > Hi,
> > I have a dataset in which I would like to select rows based on matching
> > conditions and return the maximum value of a variable else return one row if
> > duplicate counts exist.  My dataset looks like this:
> > PGID	PTID	Year	 Visit  Count
> > 6755	53121	2009	1	0
> > 6755	53121	2009	2	0
> > 6755	53121	2009	3	0
> > 6755	53122	2008	1	0
> > 6755	53122	2008	2	0
> > 6755	53122	2008	3	1
> > 6755	53122	2009	1	0
> > 6755	53122	2009	2	1
> > 6755	53122	2009	3	2
> >
> > I would like to select rows if PTID and Year match and return the maximum
> > count else return one row if counts are the same, such that I get this
> > output
> > PGID	PTID	Year	 Visit  Count
> > 6755	53121	2009	1	0
> > 6755	53122	2008	3	1
> > 6755	53122	2009	3	2
> >
> > I tried the following code and the output is almost correct but duplicate
> > values were included
> > df2<-with(df, sapply(split(df, list(PTID, Year)),
> > function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
> > df<-do.call(rbind,df)
> > rownames(df)<-1:nrow(df)
> >
> > Any suggestions?
> > Thanks much for your responses!
> >
> >
> >
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
> on-matching-conditions-and-logical-operators-tp4637809.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list