[R-sig-ME] Quadratic term in linear model and model over-parameterization

Wed Mar 15 11:13:34 CET 2017

Dear Fred,

I have to say... wow! Really, you got *only* the comment about adding a
quadratic effect to the model?!? The review process itself seems to be
much more ill-conditioned these days than I thought...

First, this is the mailing list for mixed effects models, i.e.,
multilevel models. Your question seems to be on "normal" linear models
(which are a special case of the former without any random effects) and
you offered no hint on any pseudoreplication in your data, i.e., no
random effects that you fitted. Are all of your data points really
independent? If so, this might not be the right mailing list to ask your
question.

Second, to give you some hints about where to start your modelling
endeavour:

You need about 10-15 data points (rule of thumb) to reliably estimate
one parameter, so you have about 382 / 15 = 25 possible parameters you
can estimate. Your models are overfitted! And the p-values you are
getting are totally nonsensical in my opinion (besides the discussion
about the sense of p-values at all).

So, regarding your question 5: NONE!

You should start at reading Frank Harrell's 2015 book "Regression
Modeling Strategies (2nd-Ed)" to give yourself a better foundation about
linear models. You really have to begin with the basics of what you are
doing...
And in that book you'll also find answers to all the other question you
asked.

Sorry for being a bit harsh here, but I do not know another way of
telling you this.

Good luck!

Am 15.03.2017 um 10:46 schrieb f_fran03 at uni-muenster.de:
> Dear all,
> 
> I’m new to this mailing list and really hope that somebody here can help me with the following issue:
> 
> I calculated the following linear models on a BoxCox transformed response variable with 382 data points:
> Model 1: Y~x+a+b+c+d+e+(a*b)+(a*c)+ (a*d)+…+(a*b*c)+(a*b*d)+(a*b*e)+…
> a: 'Experimental Temperature' (Temp1, Temp2)
> b: 'Host Population' (PopX, PopY)
> c: 'Parasite Population' (PopX, PopY)
> d: 'Host Gender' (male, female)
> Additionally, I included the continuous predictor variable 'Parasite Weight' (e) and all possible 2-way (10 interactions) and 3-way (10 interactions) interactions into the model.
> 
> In model 2 I replaced the two main effects 'Host Population' and 'Parasite Population' with one variable ('Sympatry/Allopatry') that combines the two effects. Apart from this, model 2 (six 2-way interactions and four 3-way interactions) was identical to model 1.
> 
> I am interested now in all interactions that include the continuous predictor variable 'Parasite Weight'. I got such a significant interaction ('Experimental Temperature x Parasite Population x Parasite Weight', p = 0.010) from model 1.
> 
> We sent a manuscript containing these two models to a journal for review and got it back now with a comment from a reviewer who suggested that we look for non-linear relationships involving 'Parasite Weight'.
> 
> Thus, I calculated model 1.2 which corresponds to model 1 but additionally added the quadratic term of 'Parasite Weight' ('Parasite Weight^2') and the respective interactions (in total 14 x 2-way interactions and 16 x 3-way interactions). I did the same for model 2, which resulted in model 2.2 with nine 2-way interactions and seven 3-way interactions.
> 
> The significant interaction I found with model 1 was not significant anymore with model 1.2 and in model 2.2 two interactions became significant ('Host Gender x Sympatry/Allopatry x Parasite Weight', p = 0.038 and 'Host Gender x Sympatry/Allopatry x Parasite Weight^2', p = 0.044) that were not significant in model 2.
> 
> Here are my questions:
> 1. Why is it that including the quadratic term removes some significant effects while adding others?
> 2. What does it mean when both an interaction including the linear term and the same interaction including the quadratic term become significant? Does this suggest a non-linear relationship or both a linear and a non-linear relationship?
> 3. Could it be that the disappearance of the interaction that was significant in model 1, is caused by an over-parameterization of model 1.2 and how can I prove this (with all the models we have the potential problem of many interactions and main effects)?
> 4. Are there any general arguments for when to include a quadratic term into a model and when quadratic terms should be avoided?
> 5. Which model can I trust?
> 
> Thank you very much in advance for any advice you can give me,
> 
> Fred.
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
--

_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING