[Rd] predict (PR#2686)

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Wed Mar 26 07:19:39 MET 2003


This is intentional.  The coding for factors is based on the full set of 
levels, and should be comparable for different prediction sets.

If you are using factors with fictitious levels the fix is obvious: 
improve the design.

On Wed, 26 Mar 2003 Mark.Bravington at csiro.au wrote:

> #       r-bugs at r-project.org
> 
> `predict' complains about new factor levels, even if the "new" levels are
> merely levels in the original that didn't occur in the original fit and were
> sensibly dropped, and that don't occur in the prediction data either. (At
> least if `drop.unused.levels' was set to TRUE, which the default.)

Actually, the default is FALSE: see args(model.frame.default).  lm and glm
call model.frame.default with non-default args.

> test> scrunge.data.2_ data.frame( y=runif( 3), disc=factor( c( 'cat', 'dog',
> 'cat'), levels=c( 'cat', 'dog', 'earwig')))
> test> lm.predbug.2_ lm( y~disc, data=scrunge.data.2)
> test> predict(lm.predbug.2, newdata=scrunge.data.2)
> Error in model.frame.default(object, data, xlev = xlev) : 
>         factor disc has new level(s) earwig
> 
> 
> A cure for this seems to be to add the commented line below towards the end
> of `model.frame.default':
> 
>     <<...>>
>     if (length(xlev) > 0) {
>         for (nm in names(xlev)) if (!is.null(xl <- xlev[[nm]])) {
>             xi <- data[[nm]]
>             if (is.null(nxl <- levels(xi))) 
>                 warning(paste("variable", nm, "is not a factor"))
>             else {
>                 xi <- xi[, drop = TRUE]
>                 nxl <- levels( xi) # MVB: remove droppees
>                 if (any(m <- is.na(match(nxl, xl)))) 
>                   stop(paste("factor", nm, "has new level(s)", nxl[m]))
>             }
>         }
>     }
>     else if (drop.unused.levels) {
>     <<...>>
>     
> cheers
> Mark
> 
> *******************************
> 
> Mark Bravington
> CSIRO (CMIS)
> PO Box 1538
> Castray Esplanade
> Hobart
> TAS 7001
> 
> phone (61) 3 6232 5118
> fax (61) 3 6232 5012
> Mark.Bravington at csiro.au 
> 
> --please do not edit the information below--
> 
> Version:
>  platform = i386-pc-mingw32
>  arch = i386
>  os = mingw32
>  system = i386, mingw32
>  status = 
>  major = 1
>  minor = 6.2
>  year = 2003
>  month = 01
>  day = 10
>  language = R
> 
> Windows 2000 Professional (build 2195) Service Pack 3.0
> 
> Search Path:
>  .GlobalEnv, ROOT, package:handy, package:debug, mvb.session.info,
> package:mvbutils, package:tcltk, Autoloads, package:base
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list