[R] Remove missings (quick question)

Marc Schwartz marc_schwartz at me.com
Fri Nov 9 19:17:44 CET 2012


On Nov 9, 2012, at 11:23 AM, Bert Gunter <gunter.berton at gene.com> wrote:

> Marc et. al:
> 
> On Fri, Nov 9, 2012 at 9:05 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>> On Nov 9, 2012, at 10:50 AM, Eiko Fried <torvon at gmail.com> wrote:
>> 
>>> A colleague wrote the following syntax for me:
>>> 
>>> D = read.csv("x.csv")
>>> 
>>> ## Convert -999 to NA
>>> for (k in 1:dim(D)[2]) {
>>>   I = which(D[,k]==-999)
>>>   if (length(I) > 0) {
>>>       D[I,k] = NA
>>>   }
>>> }
>>> 
>>> The dataset has many missing values. I am running several regressions on
>>> this dataset, and want to ensure every regression has the same subjects.
>>> 
>>> Thus I want to drop subjects listwise for dependent variables y1-y9 and
>>> covariates x1-x5 (if data is missing on ANY of these variables, drop
>>> subject).
>>> 
>>> How would I do this after running the syntax above?
>>> 
>>> Thank you
>> 
>> 
>> Modify the initial read.csv() call to:
>> 
>>  D <- read.csv("x.csv", na.strings = "-999")
>> 
>> That will convert all -999 values to NA's upon import so that you don't have to post-process it.
>> 
>> See ?read.csv for more info.
>> 
>> Once that is done, R's default behavior is to remove observations with any missing data (eg. NA values)
> when using modeling functions.
> 
> This appears to be false. From ?lme (nlme package, nlme_3.1-105, R 2.15.2):
> 
> "na.action 	
> 
> a function that indicates what should happen when the data contain
> NAs. The default action (na.fail) causes lme to print an error message
> and terminate if there are any incomplete observations."
> 
> Frankly, I doubt that there is any uniformity for practically any
> modeling options across the vast array of "modeling functions" in R
> and (even recommended?) packages.
> 
> Cheers,
> Bert


Good point Bert. That's what I get for over-generalizing... :-)

Thanks,

Marc


> 
> Or you can pre-process using:
>> 
>>  D.New <- na.omit(D)
>> 
>> and then use D.New for all of your subsequent analyses. See ?na.omit.
>> 
>> Regards,
>> 
>> Marc Schwartz




More information about the R-help mailing list