[R] Remove duplicates from a data frame but with some special requirements

gcam gcam032 at gmail.com
Thu Dec 17 04:55:52 CET 2009


Hi all.

So I have a data frame with multiple columns/variables.  The first variable
is a major sample name for which there are some sub-samples.  Currently I
have used the following command to remove the duplicates:

Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),]

This removes all of the duplicated sample rows.

However, I just realised that, of course, this removes the first observation
of each duplicated set.  However, I wish to retain any that have the code
"Y" in another variable Samps$Loaded.  I'm at a bit of a loss as to how best
to approach this problem.

Just to reiterate.  I want to remove all duplicate lines based on sample
name, but, I want the lines to be removed with a preference given to those
that do not include a "Y" in the Loaded variable column.
-- 
View this message in context: http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list