[R] subset and na.rm not really suppressing <NA> values

Fri Jan 24 11:16:20 CET 2014

subset.data.frame() does not have an na.rm argument!

-pd 

On 23 Jan 2014, at 00:58 , Jeff Johnson <mrjefftoyou at gmail.com> wrote:

> I have a dataset "mydf" with a field EMAIL_ADDRESS. When importing, I
> specified:
> mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors = FALSE,
> na.strings=c("NA",""))
> 
> I've also tried setting na.strings= c("NA","","<NA>") but I don't know if
> it's appropriate to put <NA> there.
> 
> I'm running
> a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select =
> EMAIL_ADDRESS)
> dput(head(a,5))
> 
> structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
> NA_character_, NA_character_, NA_character_)), .Names = "EMAIL_ADDRESS",
> row.names = c(17L,
> 22L, 23L, 24L, 30L), class = "data.frame")
> 
> The results show a lot of <NA> values on screen and in the dput statement.
> 
> I don't quite understand why it is doing that. I would have expected it to
> exclude those since I had the na.rm = TRUE statement. Do you have any
> suggestions?
> 
> Thanks!
> -- 
> Jeff
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com