[R] subset and na.rm not really suppressing <NA> values
peter dalgaard
pdalgd at gmail.com
Fri Jan 24 11:16:20 CET 2014
subset.data.frame() does not have an na.rm argument!
-pd
On 23 Jan 2014, at 00:58 , Jeff Johnson <mrjefftoyou at gmail.com> wrote:
> I have a dataset "mydf" with a field EMAIL_ADDRESS. When importing, I
> specified:
> mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors = FALSE,
> na.strings=c("NA",""))
>
> I've also tried setting na.strings= c("NA","","<NA>") but I don't know if
> it's appropriate to put <NA> there.
>
> I'm running
> a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select =
> EMAIL_ADDRESS)
> dput(head(a,5))
>
> structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
> NA_character_, NA_character_, NA_character_)), .Names = "EMAIL_ADDRESS",
> row.names = c(17L,
> 22L, 23L, 24L, 30L), class = "data.frame")
>
> The results show a lot of <NA> values on screen and in the dput statement.
>
> I don't quite understand why it is doing that. I would have expected it to
> exclude those since I had the na.rm = TRUE statement. Do you have any
> suggestions?
>
> Thanks!
> --
> Jeff
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list