[Rd] update forgets about offset() (PR#6656)

Wed Mar 10 17:10:23 MET 2004

On Wed, 10 Mar 2004, Prof Brian Ripley wrote:

> On Tue, 9 Mar 2004 Mark.Bravington at csiro.au wrote:
> 
> > In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula:
> > 
> > test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1))
> > test> fit1 <- glm( y~offset(x)+z, data=df)
> > test> fit1$call
> > glm(formula = y ~ offset(x) + z, data = df)
> > 
> > test> fit1u <- update( fit1, ~.)
> > test> fit1u$call
> > glm(formula = y ~ z, data = df)
> > 
> > 
> > The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via 
> > 
> > tmp <- attr(terms(object), "term.labels")
> > 
> > but this omits any offsets. Replacing that line with the following,
> > which I think pulls in everything except the response, *seems* to fix
> > the problem without disrupting the guts of 'terms' itself:
> > 
> > tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')]
> > 
> > The suggested line might be simpler than checking the 'offset' component
> > of 'terms(object)', which won't always exist.
> 
> Sorry, but that is a common programming error.  The possible values of
> attr(terms, "response") are 0 or 1 (although code should not rely on the 
> non-existence of 2, 3, ...).  foo[-0] == foo[0] is a length-0 vector.
>
> Also, in R please use rownames(): it is easier to read and safer.

There is a second level of problems.  The rownames include all terms, even 
those with - signs, so that code would collapse

y ~ x + z - z

to y ~ x + z!

> > Footnote: strange things happen when there is more than one offset (OK,
> > there probably shouldn't be, but I thought I'd experiment):
> 
> That is allowed, and works in general.
> 
> > test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df)
> > test> fit2$call
> > glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df)
> > 
> > test> fit2u <- update( fit2, ~.)
> > test> fit2u$call
> > glm(formula = y ~ offset(log(x)) + z, data = df)
> > 
> > Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but  not the first.
> 
> The issue here is the code to remove offset terms fails if two successive 
> terms are offsets, but not otherwise.

It fact, only if the two successive offsets were first or last for two 
separate reasons, which made it hard to track down.

I have now committed patches for both problems.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595