[Rd] predict (PR#2686)

Thu Mar 27 08:23:02 MET 2003

On Thu, 27 Mar 2003 Mark.Bravington at csiro.au wrote:

> <Bravington wrote:>
> #> `predict' complains about new factor levels, even if the 
> #"new" levels are
> #> merely levels in the original that didn't occur in the 
> #original fit and were
> #> sensibly dropped, and that don't occur in the prediction 
> #data either. 
> 
> <Ripley replied:>
> #This is intentional.  The coding for factors is based on the 
> #full set of 
> #levels, and should be comparable for different prediction sets.
> #
> #If you are using factors with fictitious levels the fix is obvious: 
> #improve the design.
> 
> There is still an inconsistency bug between `lm' and `predict.lm', though.
> `lm' intentionally overlooks inactive levels of a factor, but `predict.lm'

Only if an argument is set, and originally lm did not do so.

> doesn't, even when it legitimately could. In particular, it is a bit odd to
> have no problem predicting without a `newdata' argument even when the
> original data had inactive factor levels, but then to get an error if
> `newdata=<<original data>>' is supplied explicitly! (See example.)

Read again.  predict.lm is consistent across its inputs: unlike lm it can
take variable `newdata'.  As I said the intention is to be consistent
across *prediction sets*.  Omitting newdata is not giving a prediction
set.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595