[R] missing values in logistic regression
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Oct 29 12:48:37 CEST 2004
On 29 Oct 2004, Avril Coghlan wrote:
> Dear R help list,
>
> I am trying to do a logistic regression
> where I have a categorical response variable Y
> and two numerical predictors X1 and X2. There
> are quite a lot of missing values for predictor X2.
> eg.,
>
> Y X1 X2
> red 0.6 0.2 *
> red 0.5 0.2 *
> red 0.5 NA
> red 0.5 NA
> green 0.2 0.1 *
> green 0.1 NA
> green 0.1 NA
> green 0.05 0.05 *
>
>
> I am wondering can I combine X1 and X2 in
> a logistic regression to predict Y, using
> all the data for X1, even though there are NAs in
> the X2 data?
>
> Or do I have to take only the cases for which
> there is data for both X1 and X2? (marked
> with *s above)
You need to either
1) Train separate models for Y | X1 and Y | X1, X2 and use the appropriate
one.
2) Produce an imputation model for X2 | X1, and use multiple imputation.
Given that the latter look like [0, 1] scores, mix (as suggested by PD)
is not likely to be appropriate, but e.g. a 2D kde fit may well be.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list