[R] can not extract rows which match a string
William Michels
wjm1 @end|ng |rom c@@@co|umb|@@edu
Fri Oct 4 23:04:19 CEST 2019
Apologies Ana, Of course Rui and Herve (and Richard) are correct here
in stating that NA values get 'carried through' when selecting using
the "==" operator.
To give an illustration of what (I believe) Herve means by "NAs
propagating", here's a small 11 x 8 dataframe ("zakaria") posted to
R-Help last year, which fortuitously has one column ("PO2T")
containing only the numeric value 50 as well as NAs. I compare
selecting with the "%in%" operator (as Herve suggests) and selecting
with the "==" operator. Notice the "propagating NAs" (last line of
code):
https://stat.ethz.ch/pipermail/r-help/2018-October/456798.html
> dim(zakaria)
[1] 11 8
> zakaria
STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T X X.1
1 AA15285 BAA1113 155.70 180 NA NA NA NA
2 AA15285 BAA1322 48.90 70 NA NA NA NA
3 AA15285 BAA2713 83.20 100 NA NA NA NA
4 AA15285 BAA2921 NA NA 37 50 NA NA
5 AA15285 BAA4273 NA NA NA NA NA NA
6 AA15285 BAA4513 NA NA NA NA NA NA
7 AA15286 BAA1322 48.05 70 NA NA NA NA
8 AA15286 BAA2113 68.40 100 NA NA NA NA
9 AA15286 BAA2513 41.65 60 NA NA NA NA
10 AA15286 BAA2713 82.35 100 NA NA NA NA
11 AA15286 BAA2921 NA NA 41 50 NA NA
> unique(zakaria$PO2T)
[1] NA 50
> table(zakaria$PO2T, exclude=NULL)
50 <NA>
2 9
> zakaria[!is.na(zakaria$PO2T), ]
STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T X X.1
4 AA15285 BAA2921 NA NA 37 50 NA NA
11 AA15286 BAA2921 NA NA 41 50 NA NA
> zakaria[zakaria$PO2T %in% 50, ]
STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T X X.1
4 AA15285 BAA2921 NA NA 37 50 NA NA
11 AA15286 BAA2921 NA NA 41 50 NA NA
> zakaria[zakaria$PO2T==50, ]
STUDENT_ID COURSE_CODE PO1M PO1T PO2M PO2T X X.1
NA <NA> <NA> NA NA NA NA NA NA
NA.1 <NA> <NA> NA NA NA NA NA NA
NA.2 <NA> <NA> NA NA NA NA NA NA
4 AA15285 BAA2921 NA NA 37 50 NA NA
NA.3 <NA> <NA> NA NA NA NA NA NA
NA.4 <NA> <NA> NA NA NA NA NA NA
NA.5 <NA> <NA> NA NA NA NA NA NA
NA.6 <NA> <NA> NA NA NA NA NA NA
NA.7 <NA> <NA> NA NA NA NA NA NA
NA.8 <NA> <NA> NA NA NA NA NA NA
11 AA15286 BAA2921 NA NA 41 50 NA NA
>
I am certainly taking Herve's advice seriously, but I also believe
that when importing data into R, carefully setting parameters such as
the "na.strings" parameter of read.table() can help you avoid
surprises later on.
HTH, Bill.
W. Michels, Ph.D.
On Thu, Oct 3, 2019 at 1:34 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
More information about the R-help
mailing list