[Rd] predict (PR#2686)

Mark.Bravington at csiro.au Mark.Bravington at csiro.au
Fri May 2 03:03:46 MEST 2003

Hmmm-- still looks like a bug to me! But as I don't want to hog the
airwaves, here's my last summary on this point, with a question:

#> Prediction from the original data was just an example, of course; my
#> proposal is that inactive factor levels in the prediction set should be
#> dropped. I don't see how this could ever cause inconsistent behaviour
#> prediction sets-- have I missed something?

#Yes, repeatedly: `inactive' depends on the prediction set, and that's not 
#thought desirable.

But that doesn't explain why this "is not thought desirable". Could you
provide an actual example where automatically dropping inactive levels in a
prediction dataset would cause problems? Then at last the scales might fall
from my eyes...

(1) Suppose a prediction dataset contains a factor which has inactive levels
that weren't active (or didn't exist) in the original data. Then
'predict.lm' etc give an error message, even when statistically-sensible
predictions can be made. In particular, this happens even when 'predict' is
called with the original fitting dataset as the 'newdata' argument. This
appears to be inconsistent with the documentation, at least.

(2) The only generic way to prevent the error appearing, is for users to
insert code along the lines of

predict.set[] <- lapply( predict.set, function( x) if( inherits( x,
'factor')) x[,drop=T] else x)

This doesn't look like a very helpful requirement. It is very awkward, and I
don't (yet) see how the user gains any security from it.

(3) My proposal is to change 'predict' to drop inactive factor levels, just
as 'lm' etc already do; see earlier emails for the one-line change. In
effect, step (2) gets done automatically. The code for 'predict' will still
rightly give an error if the prediction data has levels that didn't exist or
weren't active in the fitting data. Is there a counterexample where this
proposal would cause trouble?



Mark Bravington
PO Box 1538
Castray Esplanade
TAS 7001

phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington at csiro.au

More information about the R-devel mailing list