[R-sig-ME] When can the intercept be removed from regression models

Tue Jul 26 12:08:02 CEST 2016

>>>>> Shadiya Al Hashmi <saah500 at york.ac.uk>
>>>>>     on Tue, 26 Jul 2016 12:40:26 +0300 writes:

    > Thanks Thierry for your response.  I tried the model
    > before and after removing the intercept a while ago and I
    > remember that the coefficients were pretty much the same.

but other things are *not* pretty much the same, and you
really really really should obey the advice by Thierry:

   ALWAYS KEEP THE INTERCEPT IN THE MODEL !!!

(at least until you become a very experience stastician / data
 scientist / .. )

    >> p-value doesn't matter.
    >  The only salient difference was that the levels of
    > the first categorical variable in the model formula were
    > all given in the output table instead of the reference
    > level being embedded in the intercept as in the model with
    > intercept.

    > It would be nice to find examples from the literature
    > where the intercept is removed from the model. 

hopefully *not*!  at least not apart from the exceptions that
Thierry mentions below.

    > Can you think of any?

    > Shadiya

    > Sent from my iPhone

    >> On Jul 26, 2016, at 11:32 AM, Thierry Onkelinx
    >> <thierry.onkelinx at inbo.be> wrote:
    >> 
    >> Dear Shadiya,
    >> 
    >> Thou shall always keep the intercept in the model. Its
    >> p-value doesn't matter.
    >> 
    >> I use two exceptions against that rule: 1. There is a
    >> physical/biological/... reason why the intercept should
    >> be 0 2. Removing the intercept gives a different, more
    >> convenient parametrisation (but not does not changes the
    >> model fit!)
    >> 
    >> Note that in logistic regression you use a logit
    >> transformation. Hence forcing the model thru the origin
    >> on the logit scale, forces the model to 50% probability
    >> at the original scale. I haven't seen an example where
    >> that makes sense.
    >> 
    >> Bottom line: only remove the intercept when you really
    >> know what you are doing.
    >> 
    >> Best regards,
    >> 
    >> ir. Thierry Onkelinx Instituut voor natuur- en
    >> bosonderzoek / Research Institute for Nature and Forest
    >> team Biometrie & Kwaliteitszorg / team Biometrics &
    >> Quality Assurance Kliniekstraat 25 1070 Anderlecht
    >> Belgium
    >> 
    >> To call in the statistician after the experiment is done
    >> may be no more than asking him to perform a post-mortem
    >> examination: he may be able to say what the experiment
    >> died of. ~ Sir Ronald Aylmer Fisher The plural of
    >> anecdote is not data. ~ Roger Brinner The combination of
    >> some data and an aching desire for an answer does not
    >> ensure that a reasonable answer can be extracted from a
    >> given body of data. ~ John Tukey
    >> 
    >> 2016-07-26 9:50 GMT+02:00 Shadiya Al Hashmi
    >> <saah500 at york.ac.uk>:
    >>> Good morning,
    >>> 
    >>> I am in a dilemma regarding the inclusion of the
    >>> intercept in my mixed effects logistic regression
    >>> models.  Most statisticians that I talked to insist that
    >>> I shouldn’t remove the constant from my models.  One of
    >>> the pros is that the models would be of good fit since
    >>> the R2 value would be improved. Conversely, removing the
    >>> constant means that there is no guarantee that we would
    >>> end up in getting biased coefficients since the slopes
    >>> would be forced to originate from the 0.
    >>> 
    >>> I found only one textbook which does not state it but
    >>> rather seems to imply that sometimes we can remove the
    >>> constant. This is the reference provided below.
    >>> 
    >>> Cornillon, P.A., Guyader, A., Husson, F., Jégou, N.,
    >>> Josse, J., Kloareg, M., LOber, E and Rouviére,
    >>> L. (2012). *R for Statistics*: CRC Press. Taylor &
    >>> Francis Group.
    >>> 
    >>> 
    >>> 
    >>> On p.136, it says that “The p-value of less than 5% for
    >>> the constant (intercept) indicates that the constant
    >>> must appear in the model”.  So based on this, I am
    >>> assuming that a p-value of more than 5% for the
    >>> intercept would mean that the intercept should be
    >>> removed.
    >>> 
    >>> I would appreciate it if someone could help me with this
    >>> conundrum.
    >>> 
    >>> --
    >>> Shadiya