[Rd] predict (PR#2686)
Mark.Bravington at csiro.au
Mark.Bravington at csiro.au
Thu Mar 27 02:59:35 MET 2003
<Bravington wrote:>
#> `predict' complains about new factor levels, even if the
#"new" levels are
#> merely levels in the original that didn't occur in the
#original fit and were
#> sensibly dropped, and that don't occur in the prediction
#data either.
<Ripley replied:>
#This is intentional. The coding for factors is based on the
#full set of
#levels, and should be comparable for different prediction sets.
#
#If you are using factors with fictitious levels the fix is obvious:
#improve the design.
There is still an inconsistency bug between `lm' and `predict.lm', though.
`lm' intentionally overlooks inactive levels of a factor, but `predict.lm'
doesn't, even when it legitimately could. In particular, it is a bit odd to
have no problem predicting without a `newdata' argument even when the
original data had inactive factor levels, but then to get an error if
`newdata=<<original data>>' is supplied explicitly! (See example.)
Given that the (IMHO sensible) decision to drop has been taken for `lm' to
drop inactive levels, deliberately so that users don't have to change their
designs when they don't really need to, then surely it's inconsistent for
`predict' not to do the same when it's statistically OK?
[When it's not OK-- i.e. when there are levels in the prediction data that
didn't appear in the fitting data-- the cleanest solution would perhaps be
for `predict' to return NA values and a warning, rather than an error. But
that's a separate issue.]
cheers
Mark
mark.bravington at csiro.au
Slightly expanded example, and suggested fix to `model.frame.default':
test> scrunge.data.2_ data.frame( y=runif( 3), disc=factor( c( 'cat',
'dog','cat'), levels=c( 'cat', 'dog', 'earwig')))
test> lm.predbug.2_ lm( y~disc, data=scrunge.data.2)
test> predict( lm.predbug.2) # uses original data
1 2 3
0.2185388 0.5843139 0.2185388
test> predict(lm.predbug.2, newdata=scrunge.data.2) # newdata = original
data
Error in model.frame.default(object, data, xlev = xlev) :
factor disc has new level(s) earwig
A cure for this seems to be to add the commented line below, towards the end
of `model.frame.default':
<<...>>
if (length(xlev) > 0) {
for (nm in names(xlev)) if (!is.null(xl <- xlev[[nm]])) {
xi <- data[[nm]]
if (is.null(nxl <- levels(xi)))
warning(paste("variable", nm, "is not a factor"))
else {
xi <- xi[, drop = TRUE]
nxl <- levels( xi) # MVB: remove droppees
if (any(m <- is.na(match(nxl, xl))))
stop(paste("factor", nm, "has new level(s)", nxl[m]))
}
}
}
else if (drop.unused.levels) {
<<...>>
--please do not edit the information below--
Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status =
major = 1
minor = 6.2
year = 2003
month = 01
day = 10
language = R
Windows 2000 Professional (build 2195) Service Pack 3.0
Search Path:
.GlobalEnv, ROOT, package:handy, package:debug, mvb.session.info,
package:mvbutils, package:tcltk, Autoloads, package:base
More information about the R-devel
mailing list