[R] evaluating NAs in a dataframe

Philipp Pagel p.pagel at wzw.tum.de
Wed Dec 8 21:50:05 CET 2010


	Hi!

> How can one evaluate NAs in a numeric dataframe column?  For example, I have
> a dataframe (demo) with a column of numbers and several NAs. If I write
> demo.df >= 10, numerals will return TRUE or FALSE, but if the value is
> "NA", "NA" is returned. But if I write demo.df == "NA", it returns as "NA"

Sounds like you are looking for is.na :

> is.na(c(1,NA,3))
[1] FALSE  TRUE FALSE


> As an example, I want to assign rows to classes based on values in
> demo$Area. Some of the values in demo$Area are "NA"
> 
> for (i in 1:nrow(demo)) {
>   if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ## 1-10 cm2
>   if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
> 10-25cm2

[...]

>   if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
>   }
> 
> What happens is that I get the message "Error in if (demo$Area[i] > 0 &&
> demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"

First of all, you don't need a loop here. Example:

# make up some data
foo <- data.frame(a=sample(1:20, 20, replace=TRUE))
# assign to classes
foo$class <- cut(foo$a, breaks=c(-1, 7, 13, 20), labels=c('small', 'medium', 'large'))

This also works in the presence of NAs - but of course the class will
be NA in those cases which, at least in my opinion, is the correct
value.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/



More information about the R-help mailing list