[R] Problem dropping rows based on values in a column
Marc Schwartz
marc_schwartz at comcast.net
Mon Mar 26 05:07:49 CEST 2007
On Sun, 2007-03-25 at 22:19 -0400, John Sorkin wrote:
> I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly.
>
>
> # Values of PID column
> > jdata[,"PID"]
> [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
>
> #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column
> > delete<-c(14772,14744)
>
> #Try to delete last two rows, but as you will see, I am not able to drop the last two rows.
> > jdata[jdata$PID!=delete,"PID"]
> [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
> >
John,
If you had:
delete <- c(14744, 14773)
it would likely work, but only in this particular setting where you are
comparing two sequential values.
That is because you are testing a sequence of two values and the way
that you have them above, they are reversed from the order in which the
values actually appear.
For example:
Vec <- 1:10
delete <- 10:9
> Vec[Vec != delete]
[1] 1 2 3 4 5 6 7 8 9 10
However:
delete <- 9:10
> Vec[Vec != delete]
[1] 1 2 3 4 5 6 7 8
Note what happens when the values in the source vector are not
sequential:
Vec <- sample(10)
> Vec
[1] 5 1 7 3 10 8 2 6 9 4
delete <- 9:10
> Vec[Vec != delete]
[1] 5 1 7 3 10 8 2 6 4
delete <- 10:9
> Vec[Vec != delete]
[1] 5 1 7 3 8 2 6 9 4
You get a result in which the first value in 'delete' is removed, but
not the second.
When performing a logical comparison of a value to see if it is (or is
not) in a set of values, you want to use '%in%':
Vec <- 1:10
delete <- 10:9
> Vec[!Vec %in% delete]
[1] 1 2 3 4 5 6 7 8
delete <- 9:10
> Vec[!Vec %in% delete]
[1] 1 2 3 4 5 6 7 8
It also works in the permuted vector:
> Vec[!Vec %in% delete]
[1] 5 1 7 3 8 2 6 4
See ?"%in%" for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list