[R] Function for finding NA's

David Winsemius dwinsemius at comcast.net
Sun Apr 3 23:44:55 CEST 2011


On Apr 3, 2011, at 3:46 PM, Tyler Rinker wrote:

> aThanks David,
>
> After seeing the simplicity of your function versus the convoluted  
> mess I worked up I now understand why it's not necessary to have a  
> package to find NA's (and from what you said is a part of other  
> packages such as Hmisc already).

I'm actually not aware that any of the `describe` variants will return  
the indices of NA's. In the case of real dataset such an object could  
be fairly large.  It was the other descriptive functions that I said  
were probably already coded.

>
> I am at the 2 1/2 month mark as an R user and have loads to learn.   
> Simpler is better.  Thanks David for your time and I will take the  
> information you gave and put it to use in new situations.

You should also familiarize yourself with complete.cases() and the  
various functions that handle na.action parameters (linked from that  
help page). Note that complete.cases returns a logical vector (not the  
cases themselves) and is designed for indexing matrices or dataframes.

>
> Tyler
>
> > CC: r-help at r-project.org
> > From: dwinsemius at comcast.net
> > To: tyler_rinker at hotmail.com
> > Subject: Re: [R] Function for finding NA's
> > Date: Sun, 3 Apr 2011 14:19:40 -0400
> >
> >
> > On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:
> >
> > >
> > > Quick question,
> > >
> > > I tried to find a function in available packages to find NA's  
> for an
> > > entire data set (or single variables) and report the row of  
> missing
> > > values (NA's for each column). I searched the typical routes
> > > through the blogs and the help manuals for 15 minutes. Rather than
> > > spend any more time searching I created my own function to do this
> > > (probably in less time than it would have taken me to find the
> > > function).
> > >
> > > Now I still have the same question: Is this function (NAhunter I
> > > call it) already in existence? If so please direct me (because I'm
> > > sure they've written better code more efficiently). I highly doubt
> > > I'm this first person to want to find all the missing values in a
> > > data set so I assume there is a function for it but I just didn't
> > > spend enough time looking. If there is no existing function (big  
> if
> > > here), is this something people feel is worthwhile for me to put
> > > into a package of some sort?
> >
> > I'm not sure that it would have occurred to people to include it  
> in a
> > package. Consider:
> >
> > getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
> >
> > > cities
> > long lat city pop
> > 1 -58.38194 -34.59972 Buenos Aires NA
> > 2 14.25000 40.83333 <NA> NA
> > > getNa(cities)
> > $long
> > integer(0)
> >
> > $lat
> > integer(0)
> >
> > $city
> > [1] 2
> >
> > $pop
> > [1] 1 2
> >
> > There are several packages with functions by the name `describe`  
> that
> > do most or all of rest of what you have proposed. I happen to use
> > Harrell's Hmisc but the other versions should also be reviewed if  
> you
> > want to avoid re-inventing the wheel.
> > --
> > David.
> >
> > >
> > > Tyler
> > >
> > > Here's the code:
> > >
> > > NAhunter<-function(dataset)
> > > {
> > > find.NA<-function(variable)
> > > {
> > > if(is.numeric(variable)){
> > > n<-length(variable)
> > > mean<-mean(variable, na.rm=T)
> > > median<-median(variable, na.rm=T)
> > > sd<-sd(variable, na.rm=T)
> > > NAs<-is.na(variable)
> > > total.NA<-sum(NAs)
> > > percent.missing<-total.NA/n
> > > descriptives<- 
> data.frame(n,mean,median,sd,total.NA,percent.missing)
> > > rownames(descriptives)<-c(" ")
> > > Case.Number<-1:n
> > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > > missing.value<-data.frame(Case.Number,Missing.Values)
> > > missing.values<-missing.value[ which(Missing.Values=='Missing
> > > Value'),]
> > > list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
> > > MISSING VALUES"=missing.values[,1])
> > > }
> > > else{
> > > n<-length(variable)
> > > NAs<-is.na(variable)
> > > total.NA<-sum(NAs)
> > > percent.missing<-total.NA/n
> > > descriptives<-data.frame(n,total.NA,percent.missing)
> > > rownames(descriptives)<-c(" ")
> > > Case.Number<-1:n
> > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > > missing.value<-data.frame(Case.Number,Missing.Values)
> > > missing.values<-missing.value[ which(Missing.Values=='Missing
> > > Value'),]
> > > list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
> > > MISSING VALUES"=missing.values[,1])
> > > }
> > > }
> > > dataset<-data.frame(dataset)
> > > options(scipen=100)
> > > options(digits=2)
> > > lapply(dataset,find.NA)
> > > }
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius, MD
> > West Hartford, CT
> >

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list