[R] missing values in logistic regression

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri Oct 29 21:14:17 CEST 2004


(Ted Harding) wrote:
> On 29-Oct-04 Avril Coghlan wrote:
> 
>>Dear R help list,
>>
>>   I am trying to do a logistic regression
>>where I have a categorical response variable Y
>>and two numerical predictors X1 and X2. There
>>are quite a lot of missing values for predictor X2.
>>eg.,
>>
>>Y     X1   X2
>>red   0.6  0.2    *
>>red   0.5  0.2    *
>>red   0.5  NA
>>red   0.5  NA
>>green 0.2  0.1    *
>>green 0.1  NA
>>green 0.1  NA
>>green 0.05 0.05   *
>>
>>I am wondering can I combine X1 and X2 in
>>a logistic regression to predict Y, using
>>all the data for X1, even though there are NAs in
>>the X2 data?
>>
>>Or do I have to take only the cases for which
>>there is data for both X1 and X2? (marked
>>with *s above)
> 
> 
> I don't know of any R routine directly aimed at logistic regression
> with missing values as you describe.
>

The aregImpute function in the Hmisc package can handle this, using 
predictive mean matching with weighted multinomial sampling of donor 
observations' binary covariate values.

. . ..
> Ted.
> 
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list