[R] Function for finding NA's

David Winsemius dwinsemius at comcast.net
Sun Apr 3 20:19:40 CEST 2011


On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:

>
> Quick question,
>
> I tried to find a function in available packages to find NA's for an  
> entire data set (or single variables) and report the row of missing  
> values (NA's for each column).  I searched the typical routes  
> through the blogs and the help manuals for 15 minutes.  Rather than  
> spend any more time searching I created my own function to do this  
> (probably in less time than it would have taken me to find the  
> function).
>
> Now I still have the same question:  Is this function (NAhunter I  
> call it) already in existence?  If so please direct me (because I'm  
> sure they've written better code more efficiently).  I highly doubt  
> I'm this first person to want to find all the missing values in a  
> data set so I assume there is a function for it but I just didn't  
> spend enough time looking.  If there is no existing function (big if  
> here), is this something people feel is worthwhile for me to put  
> into a package of some sort?

I'm not sure that it would have occurred to people to include it in a  
package. Consider:

getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )

 > cities
        long       lat         city pop
1 -58.38194 -34.59972 Buenos Aires  NA
2  14.25000  40.83333         <NA>  NA
 > getNa(cities)
$long
integer(0)

$lat
integer(0)

$city
[1] 2

$pop
[1] 1 2

There are several packages with functions by the name `describe` that  
do most or all of rest of what you have proposed. I happen to use  
Harrell's Hmisc but the other versions should also be reviewed if you  
want to avoid re-inventing the wheel.
-- 
David.

>
> Tyler
>
> Here's the code:
>
> NAhunter<-function(dataset)
> {
> find.NA<-function(variable)
> {
> if(is.numeric(variable)){
> n<-length(variable)
> mean<-mean(variable, na.rm=T)
> median<-median(variable, na.rm=T)
> sd<-sd(variable, na.rm=T)
> NAs<-is.na(variable)
> total.NA<-sum(NAs)
> percent.missing<-total.NA/n
> descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)
> rownames(descriptives)<-c(" ")
> Case.Number<-1:n
> Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> missing.value<-data.frame(Case.Number,Missing.Values)
> missing.values<-missing.value[ which(Missing.Values=='Missing  
> Value'),]
> list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF  
> MISSING VALUES"=missing.values[,1])
> }
> else{
> n<-length(variable)
> NAs<-is.na(variable)
> total.NA<-sum(NAs)
> percent.missing<-total.NA/n
> descriptives<-data.frame(n,total.NA,percent.missing)
> rownames(descriptives)<-c(" ")
> Case.Number<-1:n
> Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> missing.value<-data.frame(Case.Number,Missing.Values)
> missing.values<-missing.value[ which(Missing.Values=='Missing  
> Value'),]
> list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF  
> MISSING VALUES"=missing.values[,1])
> }
> }
> dataset<-data.frame(dataset)
> options(scipen=100)
> options(digits=2)
> lapply(dataset,find.NA)
> } 		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list