[R-sig-ME] When can the intercept be removed from regression models

Tue Jul 26 15:12:59 CEST 2016

  Comments below.

On 16-07-26 06:31 AM, Tom Fritzsche wrote:
> Hi,
> 
> since all the stats experts are on this list, I have to ask a question
> in relation to models without intercept.
> 
> In my layman's conception in a model without intercept like this one:
> 
> glmer(response ~ 0 + condition + (1 | study_participant ) + (1 |
> test_item), data=data_frame, family=binomial,
> control=glmerControl(optimizer="bobyqa"))
> 
> the levels of the predictor condition are not estimated in relation to
> the intercept but against zero absolute. With binomial data this seems
> quite handy as for each condition level the model tells me whether
> performance was significantly different from chance (like multiple
> intercepts), something a binomial test could do as well (albeit
> without accounting for the random components structure).
> This can be (and in psycholinguistic research often is) a research question.

  In this case (where the model has a categorical variable as a main
effect), you're right that the overall model fit is identical whether we
use 0+condition or 1+condition; the model is just differently
parameterized.  I think that in general computing these individual
effects *after* model-fitting, e.g. via the effects or lsmeans package,
is more sensible.  Also keep in mind that if you're comparing lots of
individual levels to zero (1) you might want to take multiple
comparisons into account (see multcomp package), (2) don't fall in the
trap of saying that two levels are different because one is
significantly different from zero and the other isn't.

> 
> Or is this total nonsense?
> 
> I have to say that I am confused when int comes to the intercepts in
> the random components ....
> 
> Tom
> 
> ---
> 
> Tom Fritzsche
> University of Potsdam
> Department of Linguistics
> Karl-Liebknecht-Straße 24-25
> 14476 Potsdam
> Germany
> 
> office: 14.140
> phone: +49 331 977 2296
> fax: +49 331 977 2095
> e-mail: tom.fritzsche at uni-potsdam.de
> web:    www.ling.uni-potsdam.de/~fritzsche
> 
> 
> 
> 
> 2016-07-26 12:08 GMT+02:00 Martin Maechler <maechler at stat.math.ethz.ch>:
>>>>>>> Shadiya Al Hashmi <saah500 at york.ac.uk>
>>>>>>>     on Tue, 26 Jul 2016 12:40:26 +0300 writes:
>>
>>     > Thanks Thierry for your response.  I tried the model
>>     > before and after removing the intercept a while ago and I
>>     > remember that the coefficients were pretty much the same.
>>
>> but other things are *not* pretty much the same, and you
>> really really really should obey the advice by Thierry:
>>
>>    ALWAYS KEEP THE INTERCEPT IN THE MODEL !!!
>>
>> (at least until you become a very experience stastician / data
>>  scientist / .. )
>>
>>
>>     >> p-value doesn't matter.
>>     >  The only salient difference was that the levels of
>>     > the first categorical variable in the model formula were
>>     > all given in the output table instead of the reference
>>     > level being embedded in the intercept as in the model with
>>     > intercept.
>>
>>     > It would be nice to find examples from the literature
>>     > where the intercept is removed from the model.
>>
>> hopefully *not*!  at least not apart from the exceptions that
>> Thierry mentions below.
>>
>>     > Can you think of any?
>>
>>     > Shadiya
>>
>>     > Sent from my iPhone
>>
>>     >> On Jul 26, 2016, at 11:32 AM, Thierry Onkelinx
>>     >> <thierry.onkelinx at inbo.be> wrote:
>>     >>
>>     >> Dear Shadiya,
>>     >>
>>     >> Thou shall always keep the intercept in the model. Its
>>     >> p-value doesn't matter.
>>     >>
>>     >> I use two exceptions against that rule: 1. There is a
>>     >> physical/biological/... reason why the intercept should
>>     >> be 0 2. Removing the intercept gives a different, more
>>     >> convenient parametrisation (but not does not changes the
>>     >> model fit!)
>>     >>
>>     >> Note that in logistic regression you use a logit
>>     >> transformation. Hence forcing the model thru the origin
>>     >> on the logit scale, forces the model to 50% probability
>>     >> at the original scale. I haven't seen an example where
>>     >> that makes sense.
>>     >>
>>     >> Bottom line: only remove the intercept when you really
>>     >> know what you are doing.
>>     >>
>>     >> Best regards,
>>     >>
>>     >> ir. Thierry Onkelinx Instituut voor natuur- en
>>     >> bosonderzoek / Research Institute for Nature and Forest
>>     >> team Biometrie & Kwaliteitszorg / team Biometrics &
>>     >> Quality Assurance Kliniekstraat 25 1070 Anderlecht
>>     >> Belgium
>>     >>
>>     >> To call in the statistician after the experiment is done
>>     >> may be no more than asking him to perform a post-mortem
>>     >> examination: he may be able to say what the experiment
>>     >> died of. ~ Sir Ronald Aylmer Fisher The plural of
>>     >> anecdote is not data. ~ Roger Brinner The combination of
>>     >> some data and an aching desire for an answer does not
>>     >> ensure that a reasonable answer can be extracted from a
>>     >> given body of data. ~ John Tukey
>>     >>
>>     >> 2016-07-26 9:50 GMT+02:00 Shadiya Al Hashmi
>>     >> <saah500 at york.ac.uk>:
>>     >>> Good morning,
>>     >>>
>>     >>> I am in a dilemma regarding the inclusion of the
>>     >>> intercept in my mixed effects logistic regression
>>     >>> models.  Most statisticians that I talked to insist that
>>     >>> I shouldn’t remove the constant from my models.  One of
>>     >>> the pros is that the models would be of good fit since
>>     >>> the R2 value would be improved. Conversely, removing the
>>     >>> constant means that there is no guarantee that we would
>>     >>> end up in getting biased coefficients since the slopes
>>     >>> would be forced to originate from the 0.
>>     >>>
>>     >>> I found only one textbook which does not state it but
>>     >>> rather seems to imply that sometimes we can remove the
>>     >>> constant. This is the reference provided below.
>>     >>>
>>     >>> Cornillon, P.A., Guyader, A., Husson, F., Jégou, N.,
>>     >>> Josse, J., Kloareg, M., LOber, E and Rouviére,
>>     >>> L. (2012). *R for Statistics*: CRC Press. Taylor &
>>     >>> Francis Group.
>>     >>>
>>     >>>
>>     >>>
>>     >>> On p.136, it says that “The p-value of less than 5% for
>>     >>> the constant (intercept) indicates that the constant
>>     >>> must appear in the model”.  So based on this, I am
>>     >>> assuming that a p-value of more than 5% for the
>>     >>> intercept would mean that the intercept should be
>>     >>> removed.
>>     >>>
>>     >>> I would appreciate it if someone could help me with this
>>     >>> conundrum.
>>     >>>
>>     >>> --
>>     >>> Shadiya
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>