[R] How to count rows with a condition

William Dunlap wdunlap at tibco.com
Thu Oct 18 02:50:46 CEST 2012


data[ ave(data$ac_name, data$ac_name, length) <= 5, ]
fails for two reasons:
  a) you need to label the FUN argument, FUN=length, since there
      is a ... in the middle of ave's argument list to catch all the grouping arguments
  b) the type of the first argument to needs to be compatible with
      the type of the return value of FUN().  If ac_name is a factor
      you get NA's and warnings, if it is character  the "<5" starts using
      character order instead of numerical order, leading to incorrect results
      because "11"<"5":

> data <- data.frame(ac_name=rep(c("Amos","Boris","Charlotte"),c(3,8,11)), n=101:122, stringsAsFactors=FALSE)
> data[ ave(data$ac_name, data$ac_name, FUN=length) <= 5, ]
     ac_name   n
1       Amos 101
2       Amos 102
3       Amos 103
12 Charlotte 112
13 Charlotte 113
... [ rows elided ] ...
22 Charlotte 122
> data <- data.frame(ac_name=rep(c("Amos","Boris","Charlotte"),c(3,8,11)), n=101:122, stringsAsFactors=TRUE)
> data[ ave(data$ac_name, data$ac_name, FUN=length) <= 5, ]
      ac_name  n
NA       <NA> NA
NA.1     <NA> NA
NA.2     <NA> NA
... [rows elided] ...
NA.21    <NA> NA
Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = 3L) :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, i, value = 8L) :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, i, value = 11L) :
  invalid factor level, NAs generated
4: In Ops.factor(ave(data$ac_name, data$ac_name, FUN = length), 5) :
  <= not meaningful for factors

That is why I made the first argument integer:

> data[ ave(integer(nrow(data)), data$ac_name, FUN=length) <= 5, ]
  ac_name   n
1    Amos 101
2    Amos 102
3    Amos 103
  

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of David Winsemius
> Sent: Wednesday, October 17, 2012 1:25 PM
> To: fxen3k
> Cc: r-help at r-project.org
> Subject: Re: [R] How to count rows with a condition
> 
> 
> On Oct 17, 2012, at 5:44 AM, fxen3k wrote:
> 
> > Hi,
> >
> > I have a dataset called "data". There is one row called "ac_name".
> > Some
> > names in this column appear very often, some less.
> > What I want is to filter this dataset with the following condition:
> >
> > Exclude the names, which appear more than five times. (example:
> > House A
> > appears 8 times ==> exclude it; House B appears 5 times ==> include
> > it etc.)
> >
> > In the end, I want to have the old "data" dataset excluding the rows
> > with
> > the above mentioned condition and another list with all the names
> > which have
> > been excluded.
> >
> 
> data[ ave(data$ac_name, data$ac_name, length) <= 5, ]  # all with 5 or
> fewer entries
> 
> --
> 
> David Winsemius, MD
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list