[R] get the rows so that there is no redundant element in a certain column

Joshua Wiley jwiley.psych at gmail.com
Thu Oct 28 22:40:42 CEST 2010

On Thu, Oct 28, 2010 at 11:29 AM, boshao zhang <zboshao at yahoo.com> wrote:
> Dear everyone in the Mailing list:
> It is easy to get the unique elements in a column. But I would like to get rid of those rows that the elements of this column are redundant. Or sometimes, to have a look at the rows that the elements of this column are redundant is also important. I guess it boils down to throw out the index of the redundant elements.

This is rather general, which indices do you want to throw out?  For
instance, suppose that "A" occurs in rows 1, 2, 3, 19, and 50.  Which
four do you throw out and which do you keep?  Do you always want to
keep the first?  The middle? The Last?

I would look at ?unique and ?duplicated for starters

unique() will keep the first instance, which may be fine for your
purposes (Dennis already gave an example of how to implement this).
If you need something else (e.g., the middle), you'll need something a
bit fancier.



> With millions of rows, how can I efficiently perform the task?
> Thank you in advance.
> Boshao
>        [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles

More information about the R-help mailing list