[R] '[' vs subset() behaviors when filtering a dataframe with NA values

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sun Feb 20 05:06:36 CET 2022


That is how logical indexing works... an NA index returns an NA element. Integer indexing is the same, but the which function returns all of the index positions of TRUE values and ignores NA values as it chooses integer positions, so

weight[ which(weight$weight_rec <= 24 & weight$weight_rec >= 0), ]

will skip the NAs.

BTW: the subset function works best when you let it lookup data in the data frame:

subset(weight, weight_rec <= 24 & weight_rec >= 0)

You can introduce obscure bugs (your logic, not bugs in R) by directly specifying which data frame to look in when using non-standard evaluation functions like subset, with, and within.

On February 19, 2022 7:38:03 AM PST, Fabio D'Agostino <dagostinofabi using gmail.com> wrote:
>Hi All,
>I just have two questions since I did not understand the behavior of
>'['  vs the  subset() function when filtering a dataframe that has NA
>values
>
>I was filtering a dataframe named 'weight' according to values of the
>column named 'weight_rec' ...
>str(weight)
>'data.frame': 17307 obs. of  6 variables:
> $ ICUSTAY_ID: num  229904 229904 229904 247844 247844 ...
> $ INTIME    : chr  "2127-08-11 20:43:43 UTC" "2127-08-11 20:43:43
>UTC" "2127-08-11 20:43:43 UTC" "2179-09-29 18:46:50 UTC" ...
> $ ITEMID    : num  224639 224639 226512 762 762 ...
> $ VALUENUM  : num  61 59.2 59.8 86 86 86 85.5 93 128 128 ...
> $ CHARTTIME : chr  "2127-08-14 08:00:00 UTC" "2127-08-13 08:00:00
>UTC" "2127-08-11 21:01:00 UTC" "2179-10-02 19:39:00 UTC" ...
> $ weight_rec: num  51.3 27.3 -20.7 53.2 -18.8 ...
>
>... using the following script:
>weight[weight$weight_rec <= 24 & weight$weight_rec >= 0, ]   #I get an
>output of 1055 rows
>
>while using:
>subset(weight, weight$weight_rec <= 24 & weight$weight_rec >= 0)   #I
>get an output of 1040 rows
>
>analyzing the values in the column 'weight_rec' I found that
>sum(is.na(weight$weight_rec))
>[1] 15   #15 values are NA
>
>My two questions are:
>1) Why are NA values considered when using '[' ? I only filtered for a
>condition of numeric values (i.e., >=0 & <=24)... and subset() did
>what I expected.
>2) Why are all the values of the columns of that 15 rows equal to NA
>and not only the values of the column named 'weight_rec'?
>
>Thanks in advance for clarifying this!
>Fabio
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list