[R] Extracing only Unique Rows based on only 1 Column
Bryan M Hangartner
hangartb at cecs.pdx.edu
Sat Jan 16 23:04:35 CET 2010
To Whomever is Interested,
I have spent several days searching the web, help files, the R wiki
and the archives of this mailing list for a solution to this problem,
but nonetheless I apologize in advance if I have missed something
obvious.
The problem is this; I have a 5-column data frame with about 4.2
million rows, and want to create a new (and hopefully much smaller)
data frame that contains only the rows which have a unique value in
the first column only. In other words, I do not care about the
uniqueness of the values in the other four rows, only the uniqueness
of the entries in the first row. The "unique" command does not seem to
have this option available, at least based on what I've read in the
help file.
A simplified example matrix (designated as "traveltimes"):
ID Time1 Time2
1 3 4
1 4 7
2 3 5
2 5 6
3 4 5
3 2 8
When I use a command such as
matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)
I will end up with a 6-row matrix, exactly what I already have. What I
would like to do is to remove the duplicate values in the column
labeled "ID" and their associated Time1 and Time2 entries. This will
give me a 3x3 matrix which contains only one instance of each "ID"
variable. For the purposes of this particular problem, the uniqueness
of the Time1 and Time2 rows is not relevant.
If this question is not clear enough please let me know. Thank you for
your time.
--
Bryan Hangartner
hangartb at cecs.pdx.edu
More information about the R-help
mailing list