[R] selecting rows with more than x occurrences in a given column(data type is names)

Mike Jasper mikejjasper at gmail.com
Tue Mar 13 21:35:01 CET 2007


Thanks to all of you who got me the answer. The key I was missing was
%in%. Had never seen it before.

best.

On 3/13/07, Dimitris Rizopoulos <dimitris.rizopoulos at med.kuleuven.be> wrote:
> try this:
>
> set.seed(123)
> all.data <- data.frame(name = sample(c("Joe", "Elen", "Jane", "Mike"),
> 8, TRUE),
>     x = rnorm(8), y = runif(8))
> ##########
> tab.nams <- table(all.data$name)
> nams <- names(tab.nams[tab.nams >= 2])
> all.data[all.data$name %in% nams, ]
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://med.kuleuven.be/biostat/
>      http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
> ----- Original Message -----
> From: "Mike Jasper" <mikejjasper at gmail.com>
> To: <r-help at stat.math.ethz.ch>
> Sent: Tuesday, March 13, 2007 3:38 PM
> Subject: [R] selecting rows with more than x occurrences in a given
> column(data type is names)
>
>
> > Despite a long search on the archives, I couldn't find how to do
> > this.
> > Thanks in advance for what is likely a simple issue.
> >
> > I have a data set where the first column is name (i.e., 'Joe Smith',
> > 'Jane Doe', etc). The following columns are data associated with
> > that
> > person. I have many people with multiple rows. What I want is to get
> > a
> > new data frame out with only the people who have more than x
> > occurrences in the first column.
> >
> > Here's what I've done, that's not working:
> >
> > Let's call my old data.frame "all.data"
> >
> > table(all.data$names)>10
> >
> > I get a list of names and TRUE/FALSE values. I then want to make a
> > list of the TRUEs and pass that to some subset type command like
> >
> > dup.names=table(all.data$names)>10
> >
> > new.data=(all.data[all.data$names==dup.names,])
> >
> > That's not working because the dimensions are wrong (I think). But
> > even when I tried to do part of it manually (to troubleshoot) like
> > this
> >
> > dup.names=c('Joe Smith','Jane Doe','etc')
> >
> > I got warnings and it didn't work correctly. There must be a simple
> > way to do this that I'm just not seeing. Thanks.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>



More information about the R-help mailing list