[Rd] predict (PR#2685)
Mark.Bravington at csiro.au
Mark.Bravington at csiro.au
Wed Mar 26 00:29:39 MET 2003
There is a bug in `predict' whereby the order of variables sometimes gets
re-arranged compared to the original fit, and then disaster results.
Specifically, the 'variables' and 'predvars' attributes of a 'terms' object
get out of synch. This only happens when the terms in the original formula
get re-ordered during fitting:
test> scrunge.data_ data.frame( contin=1:10, discrete=factor( rep( c( 'cat',
'dog'), 5)), resp=runif( 10))
test> lm.ok_ lm( resp ~ discrete + contin %in% discrete, data=scrunge.data)
test> predict( lm.ok, scrunge.data) # no problemo
1 2 3 4 5 6 7
8 9 10
0.29663793 0.04572655 0.42661779 0.31668732 0.55659764 0.58764809 0.68657750
0.85860886 0.81655736 1.12956963
test> lm.bug_ lm( resp ~ contin %in% discrete + discrete, data=scrunge.data)
# terms will be re-ordered
test> predict( lm.bug, scrunge.data)
Error in "contrasts<-"(*tmp*, value = "contr.treatment") :
contrasts apply only to factors
In addition: Warning message:
variable discrete is not a factor in: model.frame.default(object, data, xlev
= xlev)
This actually turns out to be a bug in `model.frame.default', to do with an
inconsistency between `predvars' and `vars' when `model.frame.default' is
called inside `predict'. AFAICS it can be fixed by including the commented
line below in `model.frame.default':
<<...>>
vars <- attr(formula, "variables")
predvars <- attr(formula, "predvars")
if (is.null(predvars))
predvars <- vars
varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not
predvars[-1]
variables <- eval(predvars, data, env)
<<...>>
This has the side-effect that there are some ugly column names in the
model.frame if e.g. a `poly' term is used, but doesn't actually seem to hurt
the prediction.
However, that doesn't fix it all. There is still a bug in `predict', even
after replacing `model.frame.default' with the above:
test> predict( lm.bug, scrunge.data)
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
subscript out of bounds
test> # wot???
This time, the bug is in `delete.response', which call `terms' to set most
of the attributes including `variables', but adjusts the original `predvars'
by hand. Because `terms' returns the variables in a different order when
it's called by `predict.lm' to when it was originally called by `lm', things
get out of synch.
This is a slightly tricky bug to fix, because `predvars' and `variables' can
look a bit different e.g. if there are `poly' terms, but I think the
following change near the end of `delete.response' does the trick:
<<...>>
if (length(formula(termobj)) == 3) {
# Old code, reliant on maintaining the order of terms: attr(tt,
"predvars") <- attr(termobj, "predvars")[-2]
reorder <- match( sapply( attr( tt, 'variables'), deparse), sapply(
attr( termobj, 'variables'), deparse)) # MVB
attr( tt, 'predvars') <- attr( termobj, 'predvars')[ reorder] # MVB
}
<<...>>
cheers
Mark
*******************************
Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001
phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington at csiro.au
More information about the R-devel
mailing list