[R-sig-ME] Large mixed & crossed-effect model looking at educational spending on crime rates with error messages

Fri Oct 11 02:40:27 CEST 2019

On 09/10/2019 08:19, Ades, James wrote:
> Thanks, Philip.
> 
> I took some time to read more about covariance/multicollinearity. I
> found two papers pretty informative/easy to digest, for anyone
> interested in or struggling with this topic. The second papers concerns
> mostly hierarchical models.
> 
> https://www.ncbi.nlm.nih.gov/pubmed/23017962
> https://www.sciencedirect.com/science/article/pii/S0049089X15000885?via%3Dihub
> 
> I’ve had some experience with BRMS, so maybe that is something to try in
> order to implement priors. I also looked into linear growth curve analysis.
> 
> Re my last question, I think you understand most of what I’m saying.
> Referring to my dataset, crime counts are the dv. If a police department
> reports crime counts for some years but not for others, would that be
> imputable? As it is now, I’ve filtered out all rows for which there is
> no crime count, under the impression that there was little to be done
> for predictors/explanatory variables with no dv. Am I mistaken?

Imputation really isn't my area of expertise, but I think you predict
unseen DVs, not impute them.

> 
> I know that lme4 drops incomplete cases (winnowing the sample), but is
> the information for some of these predictors imputable, such that I
> maintain more rows that do have a dependent variable—let’s say I’m
> missing one value for median income for one city for one year…lme4 would
> remove this entire row; but is that information imputable, such that
> lme4 doesn’t remove that row?

lme4 won't do the imputation for you. Check out the mice package. brms
also has support doing imputation as part of its model with the mi()
function. See ?brmsformula.

> Re the unreliability of Nelder Mead…what’s weird is that whereas the
> lme4 default optimizer fails to converge, Nelder Mead does. I know that
> that doesn’t necessarily imply accuracy, but in such situations would
> the results of Nelder Mead be questionable? 

No, there's simply no free lunch when it comes to optimization. Some
optimizers will work better in some situations. See ?convergence and
?allFit and make sure to check out how well the converged model actual
fits your data.

> Would it be better to opt
> for a simpler model, perhaps with a random slope without intercepts that
> works with the default--something like ( 0 + year | place_id )?

I would tend to keep random intercepts in the model. Check out ?rePCA
and the following articles for some ideas about how to simplify your model:

https://arxiv.org/abs/1506.04967

https://nextjournal.com/dmbates/complexity-in-fitting-linear-mixed-models/

Phillip

> 
> As always, thanks much!
> 
> James
> 
> 
> 
>> On Oct 1, 2019, at 12:15 AM, Phillip Alday <phillip.alday using mpi.nl
>> <mailto:phillip.alday using mpi.nl>> wrote:
>>
>>
>>
>> On 01/10/2019 08:25, Ades, James wrote:
>> I see what you’re saying
>>> with regard to the actual source of variation, but can’t it be the case
>>> that one thing isn’t vaguely related to another, and that the actual
>>> source of variation is the two variables. In such a case, aren’t there
>>> ways to parse that covariance, such that you gain a better understanding
>>> of each variable’s effect on variance? 
>>
>> This is non trivial in the general case. If you know something about the
>> latent structure, then things like structural equation models may help,
>> see e.g.
>>
>> https://www.johnmyleswhite.com/notebook/2016/02/25/a-variant-on-statistically-controlling-for-confounding-constructs-is-harder-than-you-think/
>>
>> which provides an alternative presentation of
>>
>> Westfall, J. & Yarkoni, T. (2016): Statistically Controlling for
>> Confounding Constructs Is Harder than You Think PLoS ONE, , 11 , 1-22
>>
>> Remember, linear regression -- fixed or mixed effect -- isn't sufficient
>> to make causal conclusions without additional assumptions. The issue
>> with collinearity (as long as its not perfect / leads to rank
>> deficiency) is not so much in the estimates as in the standard errors,
>> which get inflated by the covariance. There are several classical
>> approaches to dealing with this (such as residualization), but they all
>> have pros and cons. (Oversimplifying a bit) Residualization for example
>> attributes only the residual variance from the first predictor to the
>> second predictor -- i.e. all of the shared variance is attributed to the
>> first predictor. Regularized regression (e.g. LASSO, ridge, elastic net)
>> may help, especially with prediction. Equivalently, in a Bayesian
>> framework, appropriate choice of priors may help to pull the estimates
>> apart.
>>
>> But all of these comments aren't specific to the mixed-model case, so
>> that opens up the set of resources you can turn to. ;)
>>
>>
>>> Also, just want to make sure: if you don’t have a dependent observation
>>> for a given condition, you would have to remove that entire row,
>>> correct? The mixed-model wouldn’t be able to work around that? This is
>>> what i learned in stats class, but if I’m doing this wrong, I think this
>>> might also be affecting correlation.
>>
>> If I understand you correctly, you're asking what happens when your
>> response variable (y) is missing for a given combination of predictors
>> (x's)? Depending on the exact structure of the missing data, multiple
>> imputation might help you there, but generally if a particular case
>> never occurs (say "12 hours of sunlight but with winter temperatures"
>> for a model predicting plant growth derived from observations taken
>> outside but which you want to use to predict in a greenhouse), it's hard
>> to make inferences about that complete interaction. lme4 by default
>> drops incomplete cases (i.e. any rows in the dataframe where there is an
>> NA *for variables used in the model*).
>>
>> Phillip
>>
>>>
>>> Thanks, Philip!
>>>
>>> James
>>>
>>>
>>>
>>>> On Sep 29, 2019, at 3:06 AM, Phillip Alday <phillip.alday using mpi.nl
>>>> <mailto:phillip.alday using mpi.nl>
>>>> <mailto:phillip.alday using mpi.nl>> wrote:
>>>>
>>>> The default optimizer in lme4 is the default for a reason. :) While
>>>> there's no free lunch or single best optimizer for every situation, the
>>>> default was chosen based on our experience about which optimizer works
>>>> performs well across a wide range of models and datasets.
>>>>
>>>> Multicollinearity in mixed-effects models works pretty much exactly the
>>>> same way as it does in fixed-effects (i.e. regular/not mixed) regression
>>>> and so the way it's addressed (converting to PC basis, residualization,
>>>> etc.) In your case, you could omit one race and then the remaining races
>>>> will be linearly independent, albeit still correlated with another. This
>>>> correlation isn't great and will inflate your standard errors, but then
>>>> at least your design matrix won't be rank deficient.
>>>>
>>>> Regarding year-spending: Are you using 'correlated' in a strict sense,
>>>> e.g. that spending tends to go up year-by-year? Or do just mean that
>>>> including spending in the model changes the effect of year? (I think the
>>>> latter weakly implies the former, but it's a different perspective.)
>>>> Either way, the changing coefficient isn't terribly surprising. In
>>>> 'human' terms: if you don't have the option of attributing something to
>>>> the actual source of variation, but you do have something that is
>>>> vaguely related to it, then you will attribute it to that. However, if
>>>> you're ever given the chance to attribute it to the actual source, you
>>>> will do that and your attribution to the vaguely-related thing will
>>>> change.
>>>>
>>>> Best,
>>>> Phillip
>>>>
>>>> On 29/09/2019 03:20, Ades, James wrote:
>>>>> Thanks, Ben and Philip!
>>>>>
>>>>> So I think I was conflating having a continuous dependent variable,
>>>>> which could then be broken up into different categories with dummy
>>>>> variables (for instance, if I wanted to look at how wealth affects the
>>>>> distribution of race in an area, I could create a model like lmer(total
>>>>> people ~ race + per capita income + …) with creating something similar
>>>>> with a fixed factor (which I guess can’t be done).
>>>>>
>>>>> I did try running the variables independently, which worked, I just
>>>>> thought there was a way to combine races, and then per that logic,
>>>>> thought that since race variables repeated within place (city/town), I
>>>>> could nest it within PLACE_ID. But realized that the percent race as a
>>>>> fixed effect (as an output) didn’t really make sense…hence my
>>>>> confusion.
>>>>> So I guess somewhere in there my logic was afoul.
>>>>>
>>>>> Regarding Nelmed-Mead: that’s odd...I recall reading somewhere that it
>>>>> was actually quicker and more likely to converge. Good to know. I read
>>>>> through the lme4 package details here:
>>>>> https://cran.r-project.org/web/packages/lme4/lme4.pdf Would you
>>>>> recommend then optimx? Or Nloptr/bobyqa? (which I think is the
>>>>> default).
>>>>>
>>>>> Regarding multicollinearity: is there an article you could send me on
>>>>> dealing with multicollinearity in mixed-effect models? I’ve perused the
>>>>> internet, but haven’t been able to find a great how to and dealing with
>>>>> it, such that you can better parse the effects of different
>>>>> variables (I
>>>>> know that one can use PCA, but that fundamentally alters the process,
>>>>> and isn’t there a way of averaging variables such that you minimize
>>>>> collinearity?).
>>>>>
>>>>> One thing I’m currently dealing with in my model is that year as a
>>>>> fixed
>>>>> effect is correlated with a district’s spending, such that if I remove
>>>>> year, district spending has a negative effect on crime, but including
>>>>> year as a fixed effect alters the spending regression coefficient to be
>>>>> positive (just north of zero). Though here, specifically, I’m not sure
>>>>> if this is technically collinearity, or if time as a fixed factor is
>>>>> merely controlling, here, for crime change over time, where a model
>>>>> without year as a fixed factor would be looking at the effect of
>>>>> district spending on crime (similar to a model where years are averaged
>>>>> together). Does that make sense? Is that interpretation accurate?
>>>>>
>>>>> Thanks much!
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>>> On Sep 28, 2019, at 8:09 AM, Phillip Alday <phillip.alday using mpi.nl
>>>>>> <mailto:phillip.alday using mpi.nl>
>>>>>> <mailto:phillip.alday using mpi.nl>
>>>>>> <mailto:phillip.alday using mpi.nl>> wrote:
>>>>>>
>>>>>>> ink the answer to your proximal question about per_race is that
>>>>>>> you would need five *different* numerical varia
>