[R-sig-ME] correlation of fixed effects coefficients all close to +/-1

Alessandra Bielli b|e|||@@|e@@@ndr@ @end|ng |rom gm@||@com
Mon May 25 19:12:15 CEST 2020


UPDATE

Dear Phillip and list

As you can see from the graph attached, one of the categories of the
predictor variable ("madera") only has one observation.
I decided to remove this observation and I ran the model again, this is the
corr matrix I get:

Correlation of Fixed Effects:
            (Intr) Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
Tp_rsdOrgnc -0.725
Tip_rsdOtrs -0.747  0.593
Tp_rsdPplyc -0.575  0.458    0.470
Tp_rsdPlstc -0.659  0.526    0.542    0.419
Tipo_resdRd -0.445  0.356    0.367    0.282  0.328
Tipo_rsdVdr -0.747  0.593    0.612    0.470  0.542  0.367

I am aware that modifying a dataset is unacceptable, but I think it showed
that the source of the problem was lack of observations, am I correct?
Is there a better way to deal with this? I would rather not delete a line
of my dataset, even though it is a very uncommon observation for which I do
not aim to get predictions.

Thank you again for your advice


On Mon, May 25, 2020 at 10:52 AM Alessandra Bielli <
bielli.alessandra using gmail.com> wrote:

> Hi Phillip
>
> Thank you so much for your explanation.
>
> I have a couple more questions
>
> 1.In my model, the regression coefficients of each one of the categories
> of my predictor are correlated, but I just have one categorical predictor.
> In case of collinearity I would usually drop one predictor, but here I only
> have one and my goal is to use the model to predict the dependent variable.
> What's the procedure here?
>
> 2. Is there a test or visual way to determine if I have enough data to get
> good estimates?
>
> 3. A couple days ago I came across this post on Cross validated that
> states that the correlation of fixed effect part of the outpout is only
> useful in special cases,
> https://stats.stackexchange.com/questions/57240/how-do-i-interpret-the-correlations-of-fixed-effects-in-my-glmer-output.
> The post references the book
> http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf,
> page 268,
>
> "The summary concludes with a table listing the correlations of the fixed
> effects. The numbers listed here can be used to construct confidence
> elipses for pairs of fixed-effects parameters, and should not be confused
> with the normal correlation obtained by applying cor() to pairs of
> predictor vectors in the input data. Since constructing confidence ellipses
> is beyond the scope of the book we will often suppress this table".
>
> What I understand is that the correlation matrix is useful for prediction
> of future values, which is also my case, but I am not entirely sure I am
> interpreting this correctly.
>
> I really appreciate your advice!
>
> Alessandra
>
>
> On Sun, May 24, 2020 at 3:15 PM Phillip Alday <phillip.alday using mpi.nl>
> wrote:
>
>> Hi,
>>
>> Very high correlations of the fixed-effects estimates can indicate two
>> problems (which are actually just different manifestations of the same
>> deeper problem):
>>
>> 1. Multicollinearity -- this is the same as multicollinearity in
>> classical/standard/non mixed-effects regression. Basically this means
>> that some of your variables are expressing the same thing and so you
>> have some redundancies that could be eliminated. Perfect
>> multicollinearity leads to a rank-deficient model matrix, which R will
>> catch and correct, but near multicollinearity may not be caught.
>>
>> 2. You don't have enough data to get good estimates of all your
>> coefficients.
>>
>> The bigger problem for your inference is that both of these problems
>> will inflate your standard errors. In both cases, there isn't enough
>> information to full tease apart the contribution from the different
>> variables, which means that you have a lot of variability in your
>> estimates and thus large standard errors.
>>
>> Note that some correlation between estimates is expected. If you think
>> of a very simple case with the intercept and one slope/predictor then
>> you'll see that if you change the intercept, then you have to change the
>> slope a bit to get the line to stay close to the observed data.
>>
>> (Once again, I worry that I've oversimplified and said something
>> horribly infelicitous, but I'm always happy to be corrected and learn
>> something myself!)
>>
>> Best,
>>
>> Phillip
>>
>> On 11/5/20 11:42 pm, Alessandra Bielli wrote:
>> > Dear list,
>> >
>> > I am fitting the mixed effect model:
>> >  > lmer(log(percapita_day) ~ Type_residuo + (1|boatID), data=all)
>> >
>> >  where percapita_day is a non-negative continuous response variable (on
>> the
>> > log scale to have residuals normally distributed), Type_residuo is a
>> > categorical explanatory variable and boatID is a random effect with 4
>> > levels.
>> >
>> > I have found values very close to +/-1 in the correlation of fixed
>> effects
>> > matrix below, and after some research I learnt that the coefficients are
>> > not about the correlation of the variables but the expected correlation
>> of
>> > the regression coefficients.
>> >
>> > Correlation of Fixed Effects:
>> >             (Intr) Tp_rsM Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
>> > Type_rsdMtl -0.944
>> > Tp_rsdOrgnc -0.951  0.945
>> > Typ_rsdOtrs -0.959  0.953  0.959
>> > Tp_rsdPplyc -0.926  0.919  0.925    0.933
>> > Tp_rsdPlstc -0.951  0.945  0.951    0.958    0.925
>> > Type_resdRd -0.870  0.867  0.873    0.878    0.850  0.872
>> > Type_rsdVdr -0.954  0.949  0.955    0.962    0.928  0.954  0.876
>> >
>> > However I still can't explain why all coefficients are so close to +/-1
>> and
>> > I was wondering if these are indicators that something is wrong with my
>> > model?
>> > Is that due to the presence of outlayers in the response variable (see
>> > attached)?
>> >
>> > Thanks,
>> >
>> > Alessandra
>> > _______________________________________________
>> > R-sig-mixed-models using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot01.pdf
Type: application/pdf
Size: 11255 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20200525/5e845415/attachment-0001.pdf>


More information about the R-sig-mixed-models mailing list