[Rd] update forgets about offset() (PR#6656)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Mar 10 17:10:23 MET 2004
On Wed, 10 Mar 2004, Prof Brian Ripley wrote:
> On Tue, 9 Mar 2004 Mark.Bravington at csiro.au wrote:
>
> > In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula:
> >
> > test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1))
> > test> fit1 <- glm( y~offset(x)+z, data=df)
> > test> fit1$call
> > glm(formula = y ~ offset(x) + z, data = df)
> >
> > test> fit1u <- update( fit1, ~.)
> > test> fit1u$call
> > glm(formula = y ~ z, data = df)
> >
> >
> > The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via
> >
> > tmp <- attr(terms(object), "term.labels")
> >
> > but this omits any offsets. Replacing that line with the following,
> > which I think pulls in everything except the response, *seems* to fix
> > the problem without disrupting the guts of 'terms' itself:
> >
> > tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')]
> >
> > The suggested line might be simpler than checking the 'offset' component
> > of 'terms(object)', which won't always exist.
>
> Sorry, but that is a common programming error. The possible values of
> attr(terms, "response") are 0 or 1 (although code should not rely on the
> non-existence of 2, 3, ...). foo[-0] == foo[0] is a length-0 vector.
>
> Also, in R please use rownames(): it is easier to read and safer.
There is a second level of problems. The rownames include all terms, even
those with - signs, so that code would collapse
y ~ x + z - z
to y ~ x + z!
> > Footnote: strange things happen when there is more than one offset (OK,
> > there probably shouldn't be, but I thought I'd experiment):
>
> That is allowed, and works in general.
>
> > test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df)
> > test> fit2$call
> > glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df)
> >
> > test> fit2u <- update( fit2, ~.)
> > test> fit2u$call
> > glm(formula = y ~ offset(log(x)) + z, data = df)
> >
> > Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first.
>
> The issue here is the code to remove offset terms fails if two successive
> terms are offsets, but not otherwise.
It fact, only if the two successive offsets were first or last for two
separate reasons, which made it hard to track down.
I have now committed patches for both problems.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list