[R] Problem dropping rows based on values in a column

Mon Mar 26 05:07:49 CEST 2007

On Sun, 2007-03-25 at 22:19 -0400, John Sorkin wrote:
> I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly.
> 
> 
> # Values of PID column
> > jdata[,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
> 
> #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column
> > delete<-c(14772,14744)
> 
> #Try to delete last two rows, but as you will see, I am not able to drop the last two rows.
> > jdata[jdata$PID!=delete,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
> > 

John,

If you had:

  delete <- c(14744, 14773)

it would likely work, but only in this particular setting where you are
comparing two sequential values. 

That is because you are testing a sequence of two values and the way
that you have them above, they are reversed from the order in which the
values actually appear.

For example:

Vec <- 1:10
delete <- 10:9

> Vec[Vec != delete]
 [1]  1  2  3  4  5  6  7  8  9 10

However:

delete <- 9:10

> Vec[Vec != delete]
[1] 1 2 3 4 5 6 7 8

Note what happens when the values in the source vector are not
sequential:

Vec <- sample(10)

> Vec
 [1]  5  1  7  3 10  8  2  6  9  4

delete <- 9:10

> Vec[Vec != delete]
[1]  5  1  7  3 10  8  2  6  4

delete <- 10:9

> Vec[Vec != delete]
[1] 5 1 7 3 8 2 6 9 4

You get a result in which the first value in 'delete' is removed, but
not the second.

When performing a logical comparison of a value to see if it is (or is
not) in a set of values, you want to use '%in%':

Vec <- 1:10

delete <- 10:9

> Vec[!Vec %in% delete]
[1] 1 2 3 4 5 6 7 8

delete <- 9:10

> Vec[!Vec %in% delete]
[1] 1 2 3 4 5 6 7 8

It also works in the permuted vector:

> Vec[!Vec %in% delete]
[1] 5 1 7 3 8 2 6 4

See ?"%in%" for more information.

HTH,

Marc Schwartz