[R-sig-ME] correlation of fixed effects coefficients all close to +/-1
Alessandra Bielli
b|e|||@@|e@@@ndr@ @end|ng |rom gm@||@com
Mon May 25 20:52:49 CEST 2020
Dear Ben
I compared the predictions from a) the model with the singleton category as
last level of my factor vs b) the model that excluded the singleton
category. They are extremely similar.
Also, in both cases the correlations of fixed effects are weaker than in
the initial model, although in the case of a) coeff are > -0.660, while in
b) they are > -0.750.
So I am thinking that the best option is to use model a), because the
correlation is weaker and because it avoids deleting a category. But
correct me if I am wrong!
Thank you very much!
Alessandra
On Mon, May 25, 2020 at 10:34 AM Ben Bolker <bbolker using gmail.com> wrote:
> In this case it seems to make sense to drop the singleton value.
> You're not going to get very much information out of it anyway. Other
> things you could try:
>
> * make sure the singleton category is not the first level of your
> factor. (Since effects of levels are by default quantified with respect
> to the first level, having a wonky first level will make your results
> look worse/crazier overall.)
>
> * compare the results with/without the extra observation; if you get
> substantively similar results both ways, you can pick one to present as
> primary and mention that the results are similar with the other choice.
> (But be careful about cherry-picking.)
>
> On 5/25/20 1:12 PM, Alessandra Bielli wrote:
> > UPDATE
> >
> > Dear Phillip and list
> >
> > As you can see from the graph attached, one of the categories of the
> > predictor variable ("madera") only has one observation.
> > I decided to remove this observation and I ran the model again, this is
> the
> > corr matrix I get:
> >
> > Correlation of Fixed Effects:
> > (Intr) Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
> > Tp_rsdOrgnc -0.725
> > Tip_rsdOtrs -0.747 0.593
> > Tp_rsdPplyc -0.575 0.458 0.470
> > Tp_rsdPlstc -0.659 0.526 0.542 0.419
> > Tipo_resdRd -0.445 0.356 0.367 0.282 0.328
> > Tipo_rsdVdr -0.747 0.593 0.612 0.470 0.542 0.367
> >
> > I am aware that modifying a dataset is unacceptable, but I think it
> showed
> > that the source of the problem was lack of observations, am I correct?
> > Is there a better way to deal with this? I would rather not delete a line
> > of my dataset, even though it is a very uncommon observation for which I
> do
> > not aim to get predictions.
> >
> > Thank you again for your advice
> >
> >
> > On Mon, May 25, 2020 at 10:52 AM Alessandra Bielli <
> > bielli.alessandra using gmail.com> wrote:
> >
> >> Hi Phillip
> >>
> >> Thank you so much for your explanation.
> >>
> >> I have a couple more questions
> >>
> >> 1.In my model, the regression coefficients of each one of the categories
> >> of my predictor are correlated, but I just have one categorical
> predictor.
> >> In case of collinearity I would usually drop one predictor, but here I
> only
> >> have one and my goal is to use the model to predict the dependent
> variable.
> >> What's the procedure here?
> >>
> >> 2. Is there a test or visual way to determine if I have enough data to
> get
> >> good estimates?
> >>
> >> 3. A couple days ago I came across this post on Cross validated that
> >> states that the correlation of fixed effect part of the outpout is only
> >> useful in special cases,
> >>
> https://stats.stackexchange.com/questions/57240/how-do-i-interpret-the-correlations-of-fixed-effects-in-my-glmer-output
> .
> >> The post references the book
> >>
> http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf,
> >> page 268,
> >>
> >> "The summary concludes with a table listing the correlations of the
> fixed
> >> effects. The numbers listed here can be used to construct confidence
> >> elipses for pairs of fixed-effects parameters, and should not be
> confused
> >> with the normal correlation obtained by applying cor() to pairs of
> >> predictor vectors in the input data. Since constructing confidence
> ellipses
> >> is beyond the scope of the book we will often suppress this table".
> >>
> >> What I understand is that the correlation matrix is useful for
> prediction
> >> of future values, which is also my case, but I am not entirely sure I am
> >> interpreting this correctly.
> >>
> >> I really appreciate your advice!
> >>
> >> Alessandra
> >>
> >>
> >> On Sun, May 24, 2020 at 3:15 PM Phillip Alday <phillip.alday using mpi.nl>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Very high correlations of the fixed-effects estimates can indicate two
> >>> problems (which are actually just different manifestations of the same
> >>> deeper problem):
> >>>
> >>> 1. Multicollinearity -- this is the same as multicollinearity in
> >>> classical/standard/non mixed-effects regression. Basically this means
> >>> that some of your variables are expressing the same thing and so you
> >>> have some redundancies that could be eliminated. Perfect
> >>> multicollinearity leads to a rank-deficient model matrix, which R will
> >>> catch and correct, but near multicollinearity may not be caught.
> >>>
> >>> 2. You don't have enough data to get good estimates of all your
> >>> coefficients.
> >>>
> >>> The bigger problem for your inference is that both of these problems
> >>> will inflate your standard errors. In both cases, there isn't enough
> >>> information to full tease apart the contribution from the different
> >>> variables, which means that you have a lot of variability in your
> >>> estimates and thus large standard errors.
> >>>
> >>> Note that some correlation between estimates is expected. If you think
> >>> of a very simple case with the intercept and one slope/predictor then
> >>> you'll see that if you change the intercept, then you have to change
> the
> >>> slope a bit to get the line to stay close to the observed data.
> >>>
> >>> (Once again, I worry that I've oversimplified and said something
> >>> horribly infelicitous, but I'm always happy to be corrected and learn
> >>> something myself!)
> >>>
> >>> Best,
> >>>
> >>> Phillip
> >>>
> >>> On 11/5/20 11:42 pm, Alessandra Bielli wrote:
> >>>> Dear list,
> >>>>
> >>>> I am fitting the mixed effect model:
> >>>> > lmer(log(percapita_day) ~ Type_residuo + (1|boatID), data=all)
> >>>>
> >>>> where percapita_day is a non-negative continuous response variable
> (on
> >>> the
> >>>> log scale to have residuals normally distributed), Type_residuo is a
> >>>> categorical explanatory variable and boatID is a random effect with 4
> >>>> levels.
> >>>>
> >>>> I have found values very close to +/-1 in the correlation of fixed
> >>> effects
> >>>> matrix below, and after some research I learnt that the coefficients
> are
> >>>> not about the correlation of the variables but the expected
> correlation
> >>> of
> >>>> the regression coefficients.
> >>>>
> >>>> Correlation of Fixed Effects:
> >>>> (Intr) Tp_rsM Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
> >>>> Type_rsdMtl -0.944
> >>>> Tp_rsdOrgnc -0.951 0.945
> >>>> Typ_rsdOtrs -0.959 0.953 0.959
> >>>> Tp_rsdPplyc -0.926 0.919 0.925 0.933
> >>>> Tp_rsdPlstc -0.951 0.945 0.951 0.958 0.925
> >>>> Type_resdRd -0.870 0.867 0.873 0.878 0.850 0.872
> >>>> Type_rsdVdr -0.954 0.949 0.955 0.962 0.928 0.954 0.876
> >>>>
> >>>> However I still can't explain why all coefficients are so close to
> +/-1
> >>> and
> >>>> I was wondering if these are indicators that something is wrong with
> my
> >>>> model?
> >>>> Is that due to the presence of outlayers in the response variable (see
> >>>> attached)?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Alessandra
> >>>> _______________________________________________
> >>>> R-sig-mixed-models using r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list