[R-sig-eco] Offsets in Poisson or Neg. Bin regression

Wed Jun 26 11:42:44 CEST 2013

Hi again Ivailo,

Yes, the `offset' and the covariate are the same thing.  Including them 
both simply alters the functional form of the linear predictor in your 
model.  No, they are not collinear in the typical sense as there is only 
one parameter (linear form) between them -- the offset term does not 
have a parameter that will be estimated associated with it.  For 
example, with log( effort) added as a linear covariate the log-link GLM is

log( E(y)) = offset + beta * log( effort) + other_stuff = log( effort) + 
beta * log( effort) + other_stuff = beta_1 * log( effort) + other_stuff
where beta_1=1+beta.

If you test that beta==0 (which is not beta_1) then you are testing that 
the effect of effect is purely scaling (as per nomenclature before).  
This is the same as McCullagh and Nelder's testing to see if beta_1==1.  
Thanks for the pointer to McCullagh and Nelder -- I didn't know that 
they suggested that.

My depiction of the effect of effort as f( effort) is to allow for the 
possibility that the effect of effort may be non-linear on the link 
scale.  A simple example is when f(effort) is a low-order polynomial.  
Departures from effort being a purely scaling term may extend beyond 
linearity.  One may even want to consider regression splines or even 
more flexible GAMs.

Having said all this though, it is my practice to be quite conservative 
with including effort as anything but a scaling variable (offset).  It 
seems to me that there needs to be good reason before jumping to strong 
conclusions that may have no basis in the phenomenon under study.

Hope this helps,

Scott

On 26/06/13 18:14, Ivailo wrote:
> On Tue, Jun 25, 2013 at 2:02 PM, Scott Foster <scott.foster at csiro.au> wrote:
>> Hi Ivailo,
>>
>> Good question.  Difficult to answer, which is probably why you haven't had
>> any responses yet (that the list has seen).
>>
>> If you include an offset term with a log link function then you are assuming
>> that the random variable (counts say) depend on the offset with a known
>> relationship.  Generally, this is precisely what you want to do -- for
>> example standardising counts for the sampling effort taken to obtain those
>> counts.
>>
>> However, in some situations it is conceivable that the sampling effort
>> itself affects the count random variable.  An example may be fish in a trawl
>> net -- as the net gets full it becomes less and less efficacious.  In this
>> case you may expect that a single unit of effort change will have different
>> effect when there has been lots of previous effort to when there hasn't.
> Thanks for commenting on that, Scott!
>
> Although both alternatives you mention above assume that the RV
> depends on either the "offset" or the "sampling effort", but aren't
> these are essentially the same?
>
>> If I thought that I was in the latter case, I may fit a model like
>>
>> log( E( count)) = log( effort) + f(effort) + other stuff.
>>
>> The function f(effort) can take any form, including beta*log(effort).  In
>> such a case a test of beta==0 is equivalent to testing if the effect of
>> effort is purely scaling or if it is something else/sinister.  General forms
>> of f(effort) may tell you much more but may also be much more confusing.
>>
>> To choose between the two cases above (offset versus offset+covariate), I
>> would base my choice largely on prior knowledge of the system under study.
>> This is especially so if I don't have much data.
> My approach to modeling counts was primarily based on the widespread
> advise that varying effort should be considered by adding an offset to
> the model, but when I consulted the book by McCullagh and Nelder
> (1989), I found on pp. 206-207 hat they actually estimated the
> log(effort) term as being ~ 1. So started my confusion on the topic
> "to offset or to estimate" ;-)
>
> It never occurred to me, though, that the effort could be entered both
> as an offset *and* as a covariate into the model. As these two terms
> have good chances being collinear, I wonder how one can then separate
> their influence on the RV. I do not fully understand your idea
> regarding the form of the function "f(effort) ", but I get that if the
> coefficient of effort is estimated as == 0, then it should be
> concluded that effect of effort should be retained *only* as an offset
> to account for the "scaling". Am I right?
>
> Thanks again for your elucidating comment,
> Ivailo
> --
> UBUNTU: a person is a person through other persons.
>
>

-- 
Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au