[Rd] predict (PR#2685)

Mark.Bravington at csiro.au Mark.Bravington at csiro.au
Thu Mar 27 11:42:27 MET 2003


( summary: predict now gets round problems caused by re-ordering of terms )

Sorry-- glad it's fixed & surprised I left the info off-- it's in
yesterday's other bug reports tho' :).

I know that I searched the bug tracking page yesterday to see whether the
bugs were already reported before sending them. Didn't find anything
yesterday, but there they are today in black & white. Must have done
something daft.

cheers
Mark

*******************************

Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001

phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington at csiro.au 

#-----Original Message-----
#From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk]
#Sent: Wednesday, 26 March 2003 6:09 PM
#To: Mark.Bravington at csiro.au
#Cc: r-devel at stat.math.ethz.ch; R-bugs at biostat.ku.dk
#Subject: Re: [Rd] predict (PR#2685)
#
#
#You forgot to give the R version, platform etc.
#
#This is already fixed in R-devel, and your example works there 
#provided a
#valid assignment operator is used.
#
#It is the same PR#2206, and is marked as fixed in R-bugs.
#
#On Wed, 26 Mar 2003 Mark.Bravington at csiro.au wrote:
#
#> There is a bug in `predict' whereby the order of variables 
#sometimes gets
#> re-arranged compared to the original fit, and then disaster results.
#> Specifically, the 'variables' and 'predvars' attributes of a 
#'terms' object
#> get out of synch. This only happens when the terms in the 
#original formula
#> get re-ordered during fitting:
#> 
#> test> scrunge.data_ data.frame( contin=1:10, 
#discrete=factor( rep( c( 'cat',
#> 'dog'), 5)), resp=runif( 10))
#> test> lm.ok_ lm( resp ~ discrete + contin %in% discrete, 
#data=scrunge.data)
#> test> predict( lm.ok, scrunge.data) # no problemo
#>          1          2          3          4          5       
#   6          7
#> 8          9         10 
#> 0.29663793 0.04572655 0.42661779 0.31668732 0.55659764 
#0.58764809 0.68657750
#> 0.85860886 0.81655736 1.12956963 
#> 
#> test> lm.bug_ lm( resp ~ contin %in% discrete + discrete, 
#data=scrunge.data)
#> # terms will be re-ordered
#> test> predict( lm.bug, scrunge.data)
#> Error in "contrasts<-"(*tmp*, value = "contr.treatment") : 
#>         contrasts apply only to factors
#> In addition: Warning message: 
#> variable discrete is not a factor in: 
#model.frame.default(object, data, xlev
#> = xlev) 
#> 
#> This actually turns out to be a bug in 
#`model.frame.default', to do with an
#> inconsistency between `predvars' and `vars' when 
#`model.frame.default' is
#> called inside `predict'. AFAICS it can be fixed by including 
#the commented
#> line below in `model.frame.default':
#> 
#>     <<...>>
#>     vars <- attr(formula, "variables")
#>     predvars <- attr(formula, "predvars")
#>     if (is.null(predvars)) 
#>         predvars <- vars
#>     varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not
#> predvars[-1]
#>     variables <- eval(predvars, data, env)
#>     <<...>>
#>     
#> This has the side-effect that there are some ugly column names in the
#> model.frame if e.g. a `poly' term is used, but doesn't 
#actually seem to hurt
#> the prediction.
#> 
#> However, that doesn't fix it all. There is still a bug in 
#`predict', even
#> after replacing `model.frame.default' with the above:
#> 
#> test> predict( lm.bug, scrunge.data)
#> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : 
#>         subscript out of bounds
#> test> # wot???        
#> 
#> This time, the bug is in `delete.response', which call 
#`terms' to set most
#> of the attributes including `variables', but adjusts the 
#original `predvars'
#> by hand. Because `terms' returns the variables in a 
#different order when
#> it's called by `predict.lm' to when it was originally called 
#by `lm', things
#> get out of synch.
#> 
#> This is a slightly tricky bug to fix, because `predvars' and 
#`variables' can
#> look a bit different e.g. if there are `poly' terms, but I think the
#> following change near the end of `delete.response' does the trick:
#>   
#>   <<...>>
#>   if (length(formula(termobj)) == 3) {
#> #   Old code, reliant on maintaining the order of terms: attr(tt,
#> "predvars") <- attr(termobj, "predvars")[-2]
#>     reorder <- match( sapply( attr( tt, 'variables'), 
#deparse), sapply(
#> attr( termobj, 'variables'), deparse)) # MVB
#>     attr( tt, 'predvars') <- attr( termobj, 'predvars')[ 
#reorder] # MVB
#>   }
#>   <<...>>
#>   
#> cheers
#> Mark
#> 
#> *******************************
#> 
#> Mark Bravington
#> CSIRO (CMIS)
#> PO Box 1538
#> Castray Esplanade
#> Hobart
#> TAS 7001
#> 
#> phone (61) 3 6232 5118
#> fax (61) 3 6232 5012
#> Mark.Bravington at csiro.au
#> 
#> ______________________________________________
#> R-devel at stat.math.ethz.ch mailing list
#> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
#> 
#
#-- 
#Brian D. Ripley,                  ripley at stats.ox.ac.uk
#Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
#University of Oxford,             Tel:  +44 1865 272861 (self)
#1 South Parks Road,                     +44 1865 272866 (PA)
#Oxford OX1 3TG, UK                Fax:  +44 1865 272595
#



More information about the R-devel mailing list