[R] evaluating NAs in a dataframe
Philipp Pagel
p.pagel at wzw.tum.de
Wed Dec 8 21:50:05 CET 2010
Hi!
> How can one evaluate NAs in a numeric dataframe column? For example, I have
> a dataframe (demo) with a column of numbers and several NAs. If I write
> demo.df >= 10, numerals will return TRUE or FALSE, but if the value is
> "NA", "NA" is returned. But if I write demo.df == "NA", it returns as "NA"
Sounds like you are looking for is.na :
> is.na(c(1,NA,3))
[1] FALSE TRUE FALSE
> As an example, I want to assign rows to classes based on values in
> demo$Area. Some of the values in demo$Area are "NA"
>
> for (i in 1:nrow(demo)) {
> if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ## 1-10 cm2
> if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
> 10-25cm2
[...]
> if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
> }
>
> What happens is that I get the message "Error in if (demo$Area[i] > 0 &&
> demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"
First of all, you don't need a loop here. Example:
# make up some data
foo <- data.frame(a=sample(1:20, 20, replace=TRUE))
# assign to classes
foo$class <- cut(foo$a, breaks=c(-1, 7, 13, 20), labels=c('small', 'medium', 'large'))
This also works in the presence of NAs - but of course the class will
be NA in those cases which, at least in my opinion, is the correct
value.
cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/
More information about the R-help
mailing list