[R-sig-ME] correlation of fixed effects coefficients all close to +/-1
b|e|||@@|e@@@ndr@ @end|ng |rom gm@||@com
Mon May 25 19:12:15 CEST 2020
Dear Phillip and list
As you can see from the graph attached, one of the categories of the
predictor variable ("madera") only has one observation.
I decided to remove this observation and I ran the model again, this is the
corr matrix I get:
Correlation of Fixed Effects:
(Intr) Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
Tip_rsdOtrs -0.747 0.593
Tp_rsdPplyc -0.575 0.458 0.470
Tp_rsdPlstc -0.659 0.526 0.542 0.419
Tipo_resdRd -0.445 0.356 0.367 0.282 0.328
Tipo_rsdVdr -0.747 0.593 0.612 0.470 0.542 0.367
I am aware that modifying a dataset is unacceptable, but I think it showed
that the source of the problem was lack of observations, am I correct?
Is there a better way to deal with this? I would rather not delete a line
of my dataset, even though it is a very uncommon observation for which I do
not aim to get predictions.
Thank you again for your advice
On Mon, May 25, 2020 at 10:52 AM Alessandra Bielli <
bielli.alessandra using gmail.com> wrote:
> Hi Phillip
> Thank you so much for your explanation.
> I have a couple more questions
> 1.In my model, the regression coefficients of each one of the categories
> of my predictor are correlated, but I just have one categorical predictor.
> In case of collinearity I would usually drop one predictor, but here I only
> have one and my goal is to use the model to predict the dependent variable.
> What's the procedure here?
> 2. Is there a test or visual way to determine if I have enough data to get
> good estimates?
> 3. A couple days ago I came across this post on Cross validated that
> states that the correlation of fixed effect part of the outpout is only
> useful in special cases,
> The post references the book
> page 268,
> "The summary concludes with a table listing the correlations of the fixed
> effects. The numbers listed here can be used to construct confidence
> elipses for pairs of fixed-effects parameters, and should not be confused
> with the normal correlation obtained by applying cor() to pairs of
> predictor vectors in the input data. Since constructing confidence ellipses
> is beyond the scope of the book we will often suppress this table".
> What I understand is that the correlation matrix is useful for prediction
> of future values, which is also my case, but I am not entirely sure I am
> interpreting this correctly.
> I really appreciate your advice!
> On Sun, May 24, 2020 at 3:15 PM Phillip Alday <phillip.alday using mpi.nl>
>> Very high correlations of the fixed-effects estimates can indicate two
>> problems (which are actually just different manifestations of the same
>> deeper problem):
>> 1. Multicollinearity -- this is the same as multicollinearity in
>> classical/standard/non mixed-effects regression. Basically this means
>> that some of your variables are expressing the same thing and so you
>> have some redundancies that could be eliminated. Perfect
>> multicollinearity leads to a rank-deficient model matrix, which R will
>> catch and correct, but near multicollinearity may not be caught.
>> 2. You don't have enough data to get good estimates of all your
>> The bigger problem for your inference is that both of these problems
>> will inflate your standard errors. In both cases, there isn't enough
>> information to full tease apart the contribution from the different
>> variables, which means that you have a lot of variability in your
>> estimates and thus large standard errors.
>> Note that some correlation between estimates is expected. If you think
>> of a very simple case with the intercept and one slope/predictor then
>> you'll see that if you change the intercept, then you have to change the
>> slope a bit to get the line to stay close to the observed data.
>> (Once again, I worry that I've oversimplified and said something
>> horribly infelicitous, but I'm always happy to be corrected and learn
>> something myself!)
>> On 11/5/20 11:42 pm, Alessandra Bielli wrote:
>> > Dear list,
>> > I am fitting the mixed effect model:
>> > > lmer(log(percapita_day) ~ Type_residuo + (1|boatID), data=all)
>> > where percapita_day is a non-negative continuous response variable (on
>> > log scale to have residuals normally distributed), Type_residuo is a
>> > categorical explanatory variable and boatID is a random effect with 4
>> > levels.
>> > I have found values very close to +/-1 in the correlation of fixed
>> > matrix below, and after some research I learnt that the coefficients are
>> > not about the correlation of the variables but the expected correlation
>> > the regression coefficients.
>> > Correlation of Fixed Effects:
>> > (Intr) Tp_rsM Tp_rsdOr Tp_rsdOt Tp_Pyc Tp_rsP Tp_rsR
>> > Type_rsdMtl -0.944
>> > Tp_rsdOrgnc -0.951 0.945
>> > Typ_rsdOtrs -0.959 0.953 0.959
>> > Tp_rsdPplyc -0.926 0.919 0.925 0.933
>> > Tp_rsdPlstc -0.951 0.945 0.951 0.958 0.925
>> > Type_resdRd -0.870 0.867 0.873 0.878 0.850 0.872
>> > Type_rsdVdr -0.954 0.949 0.955 0.962 0.928 0.954 0.876
>> > However I still can't explain why all coefficients are so close to +/-1
>> > I was wondering if these are indicators that something is wrong with my
>> > model?
>> > Is that due to the presence of outlayers in the response variable (see
>> > attached)?
>> > Thanks,
>> > Alessandra
>> > _______________________________________________
>> > R-sig-mixed-models using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 11255 bytes
Desc: not available
More information about the R-sig-mixed-models