[Rd] using predict method with an offset

Sun Mar 1 11:51:06 CET 2009

The para quoted dates from before the stats package was split off in 
2003, and is historical (it was true at one point, but not in R 
2.0.0, the earliest running version I have).

There is however still some truth lurking there: the code is

 	offset <- if (!is.null(off.num <- attr(tt, "offset")))
 	    eval(attr(tt, "variables")[[off.num+1]], newdata)
 	else if (!is.null(object$offset))
 	    eval(object$call$offset, newdata)

so if there is an offset term in the formula, the offset argument is 
ignored (unlike when fitting).  Further, this is wrong if there is 
more than one offset term.  Both of those would be pretty unusual, but 
I'll commit fixes for them.

predict.glm calls predict.lm to do this, so the same issues apply to 
it.

I've always thought that an 'offset' argument to lm and glm was an 
unnecessary complication (I think it predates the offset() function in 
R, although it is in the White Book p.222, for glm only).

On Fri, 27 Feb 2009, Heather Turner wrote:

> Hi Ken,
>
> First of all, whether you specify the offset by the argument or in the
> formula, your code requires that q25 is the same length as the variable
> Contr. You can set this up by defining your new data as follows:
>
> nd <- data.frame( Contr = cc , q25 = qlogis(0.25))
>
> This sorts out the problem of the warnings/errors. Secondly your two
> calls to predict give different results because you have not specified
> the same type - the first is predicting on the response scale and the
> second is predicting on the link scale. If you use
>
> predict(c1.glm, newdata = nd, type = "response")
> predict(c1f.glm, newdata = nd, type = "response")
>
> you get the same result. This does seem to go against the documentation
> however, so it would seem that the paragraph you quoted should be taken
> out of the help file for predict.lm.
>
> Best wishes,
>
> Heather
>
> Kenneth Knoblauch wrote:
>> Hi,
>>
>> I have run into another problem using offsets, this time with
>> the predict function, where there seems to be a contradiction
>> again between the behavior and the help page.
>>
>> On the man page for predict.lm, it says
>>
>> Offsets specified by offset in the fit by lm will not be included in
>> predictions, whereas those specified by an offset term in the formula
>> will be.
>>
>> While it indicates nothings about offsets under ?predict.glm, predict.glm
>> calls predict.lm. when there is a newdata argument.
>>
>> In the example below, the behavior is the opposite of the help
>> page, if I am understanding it correctly, and a warning is thrown
>> when it does seem to work as desired.
>>
>> c1 <- structure(list(Contr = c(0.028, 0.043, 0.064, 0.097, 0.146, 0.219
>> ), Correct = c(34L, 57L, 94L, 152L, 160L, 160L), Incorrect = c(126L,
>> 103L, 66L, 8L, 0L, 0L)), .Names = c("Contr", "Correct", "Incorrect"
>> ), row.names = c("13", "15", "17", "19", "21", "23"), class = "data.frame")
>>
>> q25 <- rep( qlogis( 0.25 ), nrow(c1) )
>>
>> # offset defined in arguments
>> c1.glm <- glm( cbind(Correct, Incorrect) ~ Contr - 1, binomial,
>>     c1, offset = q25 )
>> # offset defined in formula
>> c1f.glm <- glm( cbind(Correct, Incorrect) ~ Contr + offset(q25) -1,
>>     binomial, c1 )
>> cc <- seq( 0, 1, len = 10 )
>> nd <- data.frame( Contr = cc )
>>
>> When predict used with model for which offset was defined in
>> the arguments, offset is taken into account and a warning
>> is emitted.
>>
>> predict(c1.glm, newdata = nd, type = "response")
>>
>>         1         2         3         4         5         6         7
>> 0.2500000 0.8859251 0.9945037 0.9997628 0.9999898 0.9999996 1.0000000
>>         8         9        10
>> 1.0000000 1.0000000 1.0000000
>> Warning message:
>> In predictor + offset :
>>   longer object length is not a multiple of shorter object length
>>
>> When predict used with model for which offset was defined in
>> the formula, an error occurs
>>
>> predict( c1f.glm, newdata = nd )
>>
>> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
>> = object$xlevels) :
>>   variable lengths differ (found for 'offset(q25)')
>>
>> even if a column for offset is included in newdata,
>>
>> ndf <- cbind( nd, "offset(q25)" = rep( qlogis(0.25), length(cc) ) )
>> predict( c1f.glm, newdata = ndf )
>>
>> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
>> = object$xlevels) :
>>   variable lengths differ (found for 'offset(q25)')
>>
>> unless there is a special way to specify the offset to predict
>> that I haven't been able to figure out.
>>
>> traceback indicates the problem, again, with model.frame.default
>>
>> Thank you for any clarification.
>>
>> best,
>>
>> Ken
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595