[R] Caution on the use of model.matrix.

Rolf Turner rolf at math.unb.ca
Thu Jun 2 16:14:38 CEST 2005


I have just been bitten by a quirk in the behaviour of model.matrix.
I used model.matrix inside a function, and passed to it a formula
that was built elsewhere.

The formula was of the form ``y ~ x + w + z''.  Now, model.matrix
cheerfully accepts formulae of this form, although it only
***needs*** the right hand side, i.e. ``~ x + w + z'' --- the ``y''
can be dropped (but in general needn't be).

The quirk by which I was bitten was that if the y column of the data
frame being used contains missing values, then the corresponding rows
are dropped (silently) and the resulting design matrix has rows
corresponding only to the non-missing values of y.  This was not the
desired behaviour in my application.

Might I respectfully suggest to R Core that a WARNING be added to the
help for model.matrix to the effect that

		model.matrix(y~x + w + z,XXX)
and
		model.matrix(~x + w + z,XXX)

give DIFFERENT results if the column ``y'' of the data frame XXX
contains missing values?

					cheers,

						Rolf Turner
						rolf at math.unb.ca




More information about the R-help mailing list