[R] Caution on the use of model.matrix.
Rolf Turner
rolf at math.unb.ca
Thu Jun 2 16:14:38 CEST 2005
I have just been bitten by a quirk in the behaviour of model.matrix.
I used model.matrix inside a function, and passed to it a formula
that was built elsewhere.
The formula was of the form ``y ~ x + w + z''. Now, model.matrix
cheerfully accepts formulae of this form, although it only
***needs*** the right hand side, i.e. ``~ x + w + z'' --- the ``y''
can be dropped (but in general needn't be).
The quirk by which I was bitten was that if the y column of the data
frame being used contains missing values, then the corresponding rows
are dropped (silently) and the resulting design matrix has rows
corresponding only to the non-missing values of y. This was not the
desired behaviour in my application.
Might I respectfully suggest to R Core that a WARNING be added to the
help for model.matrix to the effect that
model.matrix(y~x + w + z,XXX)
and
model.matrix(~x + w + z,XXX)
give DIFFERENT results if the column ``y'' of the data frame XXX
contains missing values?
cheers,
Rolf Turner
rolf at math.unb.ca
More information about the R-help
mailing list