[R-sig-ME] keeping both numerically and factor coded factors

Fri Aug 2 13:47:27 CEST 2019

John Kruschke has a recent blog post highlighting how removing
correlations can drastically change your model.

http://doingbayesiandataanalysis.blogspot.com/2019/07/shrinkage-in-hierarchical-models-random.html

Again, I'm with everybody else here: I'm not opposed to "removing"
correlations per se, but you should be aware of what it means in terms
of your model, estimates, and interpretation.

Phillip

On 1/8/19 7:38 pm, Ben Bolker wrote:
> 
>   I generally agree with Robert's point of view - I don't *necessarily*
> object to removing correlations, but you have to think carefully about
> what it means.
> 
>   As to the question of "how should I put more than one factor into a
> compound symmetric model"?: suppose you want to make (L*V*D|subjects)
> compound symmetric.  You (unfortunately) have a variety of choices.  If
> you really want all CS interactions represented, I think you need the
> equivalent of (1|subjects/(L+V+D)^2) (which probably won't work as
> written, i.e.
> 
>  (1|subjects) +
>  (1|subjects:L) + (1|subjects:V) + (1|subjects:D) +
>  (1|subjects:L:V) + (1|subjects:V:D) + (1|subjects:D:L)
> 
> (if you included the (1|subjects:L:V:D) term it would be redundant with
> the residual error term).  This is getting complex again -- 7 parameters
> (still much better than (L*V*D|subjects), which gives you (16*17)/2 =
> 136 parameters to estimate) ...
> 
>   I'm not sure rstanarm will solve your problems.  That is, I don't see
> how the convergence diagnostics that rstanarm gives you are going to be
> much more useful than lme4's in deciding how to simplify the problem.
> On the other hand, rstanarm offers a big advantage in allowing you to
> set priors to keep the solutions to the fitted problem more realistic -
> it also integrates over the uncertainty in a useful way.
> 
>   [Robert: sorry if I missed or misconstrued something in your answer.
> Could you be a little more specific in how you would use rstanarm's
> output & diagnostics to help solve this kind of problem?]
> 
>  Ben Bolker
> 
> 
> 
> On 2019-08-01 10:02 a.m., Robert Long wrote:
>> Dear Elisa,
>>
>> Yes, one of the possible steps is to force correlations to zero, but then
>> you are imposing (possibly unreasonable) constraints at the cost of trying
>> to make the model converge. It is a highly questionable procedure to remove
>> something or impose constraints purely to cause a model to converge. Random
>> variables that arise in nature as part of the same data generating process
>> are rarely uncorrelated. It may be that the correlations are small and
>> /can/ reasonably be set to zero, but you should investigate whether this is
>> reasonable first.
>>
>> Removing random slopes is usually a good way to proceed.
>>
>> If you can't make progress this way you could try the rstanarm package
>> which provides a drop in replacement for lmer and will fit the model using
>> a Bayesian approach. Then, the convergence diagnostics should provide a
>> better way to solve the problem. It may be that one or more of the variance
>> components and/or correlations between them are close to zero, in which
>> case you can remove them from the random structure.
>>
>>
>> On Wed, 31 Jul 2019, 09:14 MONACO Elisa, <elisa.monaco using unifr.ch> wrote:
>>
>>> Thank you,
>>>
>>> Robert Long, I think we are claiming the same idea: the maximal model is
>>> too complex (overparameterized and with a degenerate/singular solution) and
>>> I want to reduce the random structure, following the steps suggested by
>>> Bates et al.. Am I correct?
>>> However one of these steps it's indeed "forcing to zero the correlation
>>> parameters" and check the good fit of the consequent model. Therefore my
>>> question on how to arrange my D factor in the random structure.
>>>
>>> I still don't know how to handle CS model suggested by Bolker ((1|g/f))
>>> and how to integrate more factors in that structure ((f1*f2|g/f3)?) ... any
>>> suggestions would be much appreciated!
>>>
>>> Elisa Monaco
>>>
>>>
>>> -----Message d'origine-----
>>> De : R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> De la
>>> part de Robert Long
>>> Envoyé : mercredi, 24 juillet 2019 10:33
>>> À : R-mixed models mailing list <r-sig-mixed-models using r-project.org>
>>> Objet : Re: [R-sig-ME] keeping both numerically and factor coded factors
>>>
>>> It is quite possible that such a complex random structure will not be
>>> supported by the data.
>>>
>>> In your initial email you mentioned correlations between random effects.
>>> However, since the model did not converge, there is no point in
>>> intetpreting them. Moreover, to force them to be uncorrelated is possibly
>>> making unrealistic constraints on the model.
>>>
>>> Why do seek such a complex random structure ? If you are following the
>>> advice by Barr et al (2013) to "keep it maximal", this is often very poor
>>> advice, as noted by Bates et al (2015), Bates being the primary author of
>>> the lme4 package:
>>>
>>> https://arxiv.org/pdf/1506.04967
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 24 Jul 2019, 09:01 MONACO Elisa, <elisa.monaco using unifr.ch> wrote:
>>>
>>>> Dear all,
>>>> many thanks for your answers and sorry for not providing the details.
>>>>
>>>> My experiment is a 2X2X4 within subject design, with all three factors
>>>> being categorical: L=Language of the stimuli (2 levels), V= type of
>>>> the stimuli (2 levels), D= delay of brain stimulation (4 levels). My
>>>> dependent variable is the amplitude of a physiological measure.
>>>>
>>>> I thought to build my maximal mixed model in which all the factors are
>>>> crossed within subjects and only D is crossed within items (items are
>>>> the same, repeated at different delays of stimulation):
>>>>
>>>> lmer(MEPzed ~ L * V * D  + (D|items), data=mydata,
>>>> control=lmerControl(optCtrl=list(maxfun=1e6)))
>>>>
>>>> So, to answer @Robert Long: my factor D I was referring to is a random
>>>> slope, with4 levels
>>>>
>>>> to answer using Ben Bolker:
>>>> indeed I don't think that my factor D falls in the 2 cases you
>>>> mentioned,
>>>> because:
>>>>  a) the differences between each level is not the same for each level
>>>> (150ms-75ms-75ms-150ms) and we don't expect en effect ordered in time,
>>>> we expect the effect to be present at one or more latencies depending
>>>> on L;
>>>> b) the factor has more than two levels.
>>>>
>>>> According to all of this, I should go for a CS model, right?
>>>> I'm a newbie in this field, so can you please give me some indications
>>>> of what can I read about it or some indications to understand how to
>>>> handle this (especially if I want to reduce gradually the random
>>>> structure of the subjects part, see modelreduced2)/?
>>>>
>>>> modelreduced1: lmer(MEPzed ~ L * V * D + (L*V*D|subjects) +
>>>> (1|items/D), data=mydata,
>>>> control=lmerControl(optCtrl=list(maxfun=1e6)))
>>>>
>>>> modelreduced2: lmer(MEPzed ~ L * V * D + (L*V|subjects/D) +
>>>> (1|items/D), data=mydata,
>>>> control=lmerControl(optCtrl=list(maxfun=1e6)))
>>>>
>>>>
>>>> Another point: is this semplification indipendent of which type of
>>>> contrast I set for D (I'll set sum contrast for V and L, but I'm still
>>>> reasoning on what is the best for D)?
>>>>
>>>> Thank you in advance for this big help and please tell me if you need
>>>> further clarifications or code.
>>>>
>>>>  Elisa Monaco | PhD student
>>>> ________________________________________
>>>> De : R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> de
>>>> la part de Ben Bolker <bbolker using gmail.com> Envoyé : lundi 22 juillet
>>>> 2019 17:56 À : r-sig-mixed-models using r-project.org Objet : Re: [R-sig-ME]
>>>> keeping both numerically and factor coded factors
>>>>
>>>>   Elisa,
>>>>
>>>>   Can you say a little more about what your factor represents?
>>>>
>>>>   It probably *doesn't* make sense to collapse your factor to an
>>>> integer for the purpose of allowing a diagonal covariance matrix, unless:
>>>>
>>>>  * it's reasonable to treat the factor levels as sequential values
>>>> with equal differences between each successive pair (e.g., time), OR
>>>>  * the factor only has two levels anyway
>>>>
>>>>   Another simplifying strategy is to use a compound-symmetric model
>>>> (equal correlations among all pairs of levels): if your original model
>>>> is (f|g) (where f is a factor and g is your grouping variable), then
>>>> (1|g/f) will generate a CS model.
>>>>
>>>>   cheers
>>>>     Ben Bolker
>>>>
>>>>
>>>> On 2019-07-22 10:24 a.m., Robert Long wrote:
>>>>> Dear Elisa
>>>>>
>>>>> Is this factor a grouping variable (for random intercepts) or a
>>>>> random slope ? How many levels does it have ? And lease can you give
>>>>> us the full model formula.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 22 Jul 2019, 12:17 MONACO Elisa via R-sig-mixed-models, <
>>>>> r-sig-mixed-models using r-project.org> wrote:
>>>>>
>>>>>> Dear list,
>>>>>> looking at the correlation values of my random effects, as well as
>>>>>> the fact that my model fails to converge, it makes sense to me to
>>>>>> simplify
>>>> its
>>>>>> random structure (while keeping maximal and according to our hp the
>>>> fixed
>>>>>> structure).
>>>>>> One way is to remove correlations, and I know that the || notation
>>>>>> works only with numerically coded factors.
>>>>>> As far as I understood, I have two options:
>>>>>> 1) use the package afex, putting my model as object of mixed and
>>>>>> adding "expand_re=true"
>>>>>> 2) use the original factor, by default read as "int"
>>>>>>
>>>>>> I want to use the option 2) because with mixed I can't apply the
>>>>>> PCA function for random effects to check if my model is over
>>> parameterized.
>>>>>>
>>>>>> My questions are:
>>>>>> a)    is it true that I can use my factor as it is when read by R,
>>> i.e.
>>>>>> "int"?
>>>>>> b)    if yes, does it make sense to keep in the model both the factor
>>> in
>>>>>> the nominal form as fixed effect and the factor in the numerical
>>>>>> form as random effect?
>>>>>>
>>>>>> Many thanks for your help,
>>>>>>
>>>>>> Elisa Monaco | PhD student
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>