[R] '[' vs subset() behaviors when filtering a dataframe with NA values
Fabio D'Agostino
d@go@t|no|@b| @end|ng |rom gm@||@com
Sat Feb 19 16:38:03 CET 2022
Hi All,
I just have two questions since I did not understand the behavior of
'[' vs the subset() function when filtering a dataframe that has NA
values
I was filtering a dataframe named 'weight' according to values of the
column named 'weight_rec' ...
str(weight)
'data.frame': 17307 obs. of 6 variables:
$ ICUSTAY_ID: num 229904 229904 229904 247844 247844 ...
$ INTIME : chr "2127-08-11 20:43:43 UTC" "2127-08-11 20:43:43
UTC" "2127-08-11 20:43:43 UTC" "2179-09-29 18:46:50 UTC" ...
$ ITEMID : num 224639 224639 226512 762 762 ...
$ VALUENUM : num 61 59.2 59.8 86 86 86 85.5 93 128 128 ...
$ CHARTTIME : chr "2127-08-14 08:00:00 UTC" "2127-08-13 08:00:00
UTC" "2127-08-11 21:01:00 UTC" "2179-10-02 19:39:00 UTC" ...
$ weight_rec: num 51.3 27.3 -20.7 53.2 -18.8 ...
... using the following script:
weight[weight$weight_rec <= 24 & weight$weight_rec >= 0, ] #I get an
output of 1055 rows
while using:
subset(weight, weight$weight_rec <= 24 & weight$weight_rec >= 0) #I
get an output of 1040 rows
analyzing the values in the column 'weight_rec' I found that
sum(is.na(weight$weight_rec))
[1] 15 #15 values are NA
My two questions are:
1) Why are NA values considered when using '[' ? I only filtered for a
condition of numeric values (i.e., >=0 & <=24)... and subset() did
what I expected.
2) Why are all the values of the columns of that 15 rows equal to NA
and not only the values of the column named 'weight_rec'?
Thanks in advance for clarifying this!
Fabio
More information about the R-help
mailing list