[R] subset
Guenther, Cameron
Cameron.Guenther at MyFWC.com
Tue May 16 20:54:04 CEST 2006
Marc,
I have tried unique but unique looks at the entire row. I have a data
set with a variable TRIPID. The dataset has 469,000 rows. In most
cases TRIPID is a unique value. However, in some cases I have the same
TRIPID value but different values for other variables. What this
amounts to is an data entry error. I need to get rid of the repeated
rows that have the same TRIPID but different co-variables.
Thanks for your help.
Cam
Cameron Guenther, Ph.D.
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
cameron.guenther at myfwc.com
-----Original Message-----
From: Marc Schwartz (via MN) [mailto:mschwartz at mn.rr.com]
Sent: Tuesday, May 16, 2006 2:50 PM
To: Guenther, Cameron
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] subset
On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote:
> Hello everyone,
>
> I have a large dataset (x) with some rows that have duplicate
> variables that I would like to remove. I find which rows are the
> duplicates with X1<-which(duplicated(x)). That gives me the rows with
> duplicated variables. Now, how can I remove just those rose from the
> original data frame. I think I can create a new data frame without
> the duplicates using subset. I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax. Any advice.
> Thanks in advance
Even easier would be to use unique():
NewDF < unique(x)
NewDF will contain rows from 'x' with duplicates removed.
See ?unique for more information.
unique(), which has a data.frame method, is basically:
x[!duplicated(x), , drop = FALSE]
which covers the case where the result may contain a single row and
which remains a data frame.
Note that the above presumes that you want to test all columns in 'x'
for dups.
HTH,
Marc Schwartz
More information about the R-help
mailing list