[R] subsetting large data frames.
hesicaia
dboyce at dal.ca
Sun Dec 7 18:16:38 CET 2008
Hi all,
I have a question regarding subsetting of large data frames. I have two
data frames “catches” and “tows” and they both have the same 30 variables
(columns). I would like to select rows in the data frame “tows” where all 5
specific variables are NOT matched in “catches. That is to say, the
combination of these 5 variables is unique. One or more of the variables
could be the same but the combination would be unique. This is confusing to
explain so here is a short example to explain what I am trying to explain:
Example data catches:
Row Cruise Order Townumber Towtype Ship Netlocation Var1 Var2
1 22 1 4 A B S X1 X2
2 22 1 4 A B S X1 X2
3 22 1 4 BL AM S X1 X2
4 22 1 4 BL AM S X1 X2
5 260 1 4 BL B S X1 X2
6 260 1 4 BL B S X1
X2
Example data tows:
Row Cruise Order Townumber Towtype Ship Netlocation Var1 Var2
1 22 1 4 A B S X1 X2
2 400 1 4 BL AM S X1 X2
3 260 1 4 BL B S X1 X2
4 260 10 10 BL B S X1 X2
5 22 99 4 BL B S X1 X2
I would want to select rows 2, 4, and 5 from “tows” due to the fact that the
same collection of “cruise”, ”order”, ”townumber”, ”towtype”, ”ship”, and
”netlocation” are not found in “catches”. All rows in data set “tows” are
unique. Clear as mud? Sorry I couldn’t provide real data, but these datasets
are quite large.
So far I have tried:
New<-tows[(tows$cruise != catches$cruise) & (tows$order != catches$order) &
(tows$townumber != catches$townumber) & (tows$towtype != catches$towtype) &
(tows$ship != catches$ship) & (tows$netlocation != catches$netlocation),]
But this didn’t work.
Thanks for your time and help (in advance).
Dan.
--
View this message in context: http://www.nabble.com/subsetting-large-data-frames.-tp20883217p20883217.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list