[R] can not extract rows which match a string

William Michels wjm1 @end|ng |rom c@@@co|umb|@@edu
Fri Oct 4 23:04:19 CEST 2019


Apologies Ana, Of course Rui and Herve (and Richard) are correct here
in stating that NA values get 'carried through' when selecting using
the "==" operator.

To give an illustration of what (I believe) Herve means by "NAs
propagating", here's a small 11 x 8 dataframe ("zakaria") posted to
R-Help last year, which fortuitously has one column ("PO2T")
containing only the numeric value 50 as well as NAs. I compare
selecting with the "%in%" operator (as Herve suggests) and selecting
with the "==" operator. Notice the "propagating NAs" (last line of
code):

https://stat.ethz.ch/pipermail/r-help/2018-October/456798.html

> dim(zakaria)
[1] 11  8
> zakaria
   STUDENT_ID COURSE_CODE   PO1M PO1T PO2M PO2T  X X.1
1     AA15285     BAA1113 155.70  180   NA   NA NA  NA
2     AA15285     BAA1322  48.90   70   NA   NA NA  NA
3     AA15285     BAA2713  83.20  100   NA   NA NA  NA
4     AA15285     BAA2921     NA   NA   37   50 NA  NA
5     AA15285     BAA4273     NA   NA   NA   NA NA  NA
6     AA15285     BAA4513     NA   NA   NA   NA NA  NA
7     AA15286     BAA1322  48.05   70   NA   NA NA  NA
8     AA15286     BAA2113  68.40  100   NA   NA NA  NA
9     AA15286     BAA2513  41.65   60   NA   NA NA  NA
10    AA15286     BAA2713  82.35  100   NA   NA NA  NA
11    AA15286     BAA2921     NA   NA   41   50 NA  NA
> unique(zakaria$PO2T)
[1] NA 50
> table(zakaria$PO2T, exclude=NULL)

  50 <NA>
   2    9
> zakaria[!is.na(zakaria$PO2T), ]
   STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T  X X.1
4     AA15285     BAA2921   NA   NA   37   50 NA  NA
11    AA15286     BAA2921   NA   NA   41   50 NA  NA
> zakaria[zakaria$PO2T %in% 50, ]
   STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T  X X.1
4     AA15285     BAA2921   NA   NA   37   50 NA  NA
11    AA15286     BAA2921   NA   NA   41   50 NA  NA
> zakaria[zakaria$PO2T==50, ]
     STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T  X X.1
NA         <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.1       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.2       <NA>        <NA>   NA   NA   NA   NA NA  NA
4       AA15285     BAA2921   NA   NA   37   50 NA  NA
NA.3       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.4       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.5       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.6       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.7       <NA>        <NA>   NA   NA   NA   NA NA  NA
NA.8       <NA>        <NA>   NA   NA   NA   NA NA  NA
11      AA15286     BAA2921   NA   NA   41   50 NA  NA
>

I am certainly taking Herve's advice seriously, but I also believe
that when importing data into R, carefully setting parameters such as
the "na.strings" parameter of read.table() can help you avoid
surprises later on.

HTH, Bill.

W. Michels, Ph.D.

On Thu, Oct 3, 2019 at 1:34 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:



More information about the R-help mailing list