[R-sig-ME] Large mixed & crossed-effect model looking at educational spending on crime rates with error messages
Phillip Alday
ph||||p@@|d@y @end|ng |rom mp|@n|
Sun Sep 29 12:06:36 CEST 2019
The default optimizer in lme4 is the default for a reason. :) While
there's no free lunch or single best optimizer for every situation, the
default was chosen based on our experience about which optimizer works
performs well across a wide range of models and datasets.
Multicollinearity in mixed-effects models works pretty much exactly the
same way as it does in fixed-effects (i.e. regular/not mixed) regression
and so the way it's addressed (converting to PC basis, residualization,
etc.) In your case, you could omit one race and then the remaining races
will be linearly independent, albeit still correlated with another. This
correlation isn't great and will inflate your standard errors, but then
at least your design matrix won't be rank deficient.
Regarding year-spending: Are you using 'correlated' in a strict sense,
e.g. that spending tends to go up year-by-year? Or do just mean that
including spending in the model changes the effect of year? (I think the
latter weakly implies the former, but it's a different perspective.)
Either way, the changing coefficient isn't terribly surprising. In
'human' terms: if you don't have the option of attributing something to
the actual source of variation, but you do have something that is
vaguely related to it, then you will attribute it to that. However, if
you're ever given the chance to attribute it to the actual source, you
will do that and your attribution to the vaguely-related thing will change.
Best,
Phillip
On 29/09/2019 03:20, Ades, James wrote:
> Thanks, Ben and Philip!
>
> So I think I was conflating having a continuous dependent variable,
> which could then be broken up into different categories with dummy
> variables (for instance, if I wanted to look at how wealth affects the
> distribution of race in an area, I could create a model like lmer(total
> people ~ race + per capita income + …) with creating something similar
> with a fixed factor (which I guess can’t be done).
>
> I did try running the variables independently, which worked, I just
> thought there was a way to combine races, and then per that logic,
> thought that since race variables repeated within place (city/town), I
> could nest it within PLACE_ID. But realized that the percent race as a
> fixed effect (as an output) didn’t really make sense…hence my confusion.
> So I guess somewhere in there my logic was afoul.
>
> Regarding Nelmed-Mead: that’s odd...I recall reading somewhere that it
> was actually quicker and more likely to converge. Good to know. I read
> through the lme4 package details here:
> https://cran.r-project.org/web/packages/lme4/lme4.pdf Would you
> recommend then optimx? Or Nloptr/bobyqa? (which I think is the default).
>
> Regarding multicollinearity: is there an article you could send me on
> dealing with multicollinearity in mixed-effect models? I’ve perused the
> internet, but haven’t been able to find a great how to and dealing with
> it, such that you can better parse the effects of different variables (I
> know that one can use PCA, but that fundamentally alters the process,
> and isn’t there a way of averaging variables such that you minimize
> collinearity?).
>
> One thing I’m currently dealing with in my model is that year as a fixed
> effect is correlated with a district’s spending, such that if I remove
> year, district spending has a negative effect on crime, but including
> year as a fixed effect alters the spending regression coefficient to be
> positive (just north of zero). Though here, specifically, I’m not sure
> if this is technically collinearity, or if time as a fixed factor is
> merely controlling, here, for crime change over time, where a model
> without year as a fixed factor would be looking at the effect of
> district spending on crime (similar to a model where years are averaged
> together). Does that make sense? Is that interpretation accurate?
>
> Thanks much!
>
> James
>
>
>> On Sep 28, 2019, at 8:09 AM, Phillip Alday <phillip.alday using mpi.nl
>> <mailto:phillip.alday using mpi.nl>> wrote:
>>
>>> ink the answer to your proximal question about per_race is that
>>> you would need five *different* numerical varia
>
More information about the R-sig-mixed-models
mailing list