[R-sig-ME] convergence issues on lme4 and incoherent error messages

Sat Jun 15 14:22:31 CEST 2019

Hi Christiano,

"Woudn't this be only true if for some cells there is no data for any
subject?"
Counterquestion: Wouldn't this mean that the -cell- does not exist at all?
:)
Maybe for clarification, your statement on stats reads:
"However there are missing data, meaning that for some subject, I have no
data at all for certain combinations of mPair and spd_des."
Which I will continue to refer to in my best knowledge.

Maybe I should mention a related issue (often discussed is) whether the
missing cell-values are missing-by-random, or not (if not, one
recommendation is to predict the missing values in the model, or drop them
(which seems too costly in your case) otherwise there can be estimation
biases). As far as I know, single NA observations are dropped by default in
most ME functions I know, anyway. Which also -could- cause the 0 values you
speak of in the last stats post (idk).

But estimating random slopes of factor interactions -should- simply mean,
that you have some design coding formula like (in a 2x * 2y design defined
similar to yours) ... and dummy coding now  since dummy coding is often
default... (but maybe you used contrast coding?)
cellX1_Y1estimate=intercept + randomslopedeviationX1_Y1
cellX2_Y1estimate= intercept+x2_est + randomslopedeviationX1_Y
cellX1_Y2estimate=intercept+y2_est + randomslopedeviationX2_Y1
cellX2_Y2estimate= intercept+ x2_est +y2_est+ x2_est: y2_est +
 randomslopedeviationX2_Y2
(with an random intercept there should be another term on the right-hand
side, but I admit, I have never looked into the mechanics of e.g. lme4)

But importantly, by only dropping single cell-observations (e.g.
 cellX2_Y2) you can not estimate  x2_est: y2_est anymore, but you see,
x2_est: y2_est is only modeled as on-top deviation from the other
cell-estimates. Without these on-top deviation, the other cell-estimates
sure change. Thus, a deviation from (e.g.) intercept+x2_est just means
something different when x2_est: y2_est is present, than when it is
absent... And further, because y2_est  is supposed to reflect an estimate
for all participants, so is x2_est: y2_est, which means that the random
slope deviation in x1y2 for subjects with no data in x2y2 follow a
different distribution. Maybe more striking example: just assume that there
is no observation for a subject in cell x1y2, which defines the
intercept.... what then? So in my opinion (I do not know who might share
it, and maybe this applies only to specific model/design codings)... I
would rather drop the whole subject from the random slope (and random
intercept) estimation   (Arguments against it are appreciated, maybe I am
wrong. I seldom see those discussions actually.)

Best, René

Ps.
Just for the record. I just ran a similar model over one of my own data
sets (which never produced error messages like yours; the design has
categorical factors only), but I removed the observations of -one- subject
(of about 200) in -one- cell of a 2x2 within desgin (otherwise data are
complete), and I suddenly get a never seen before error message:
" - Rescale variables?  "

To me this looks familiar :))
then a next model:  I "flagged" this one subject with 'cell_exists=0'
(otherwise 1) as described above, and voila, the 'rescale' message
disappeared :)

Am Fr., 14. Juni 2019 um 17:23 Uhr schrieb Cristiano Alessandro <
cri.alessandro using gmail.com>:

> Hi all,
>
> thanks a lot for all your help!
>
> @Rene'. I am not sure I can follow all you said.
> "If, as you say, there are measurements missing in some design-cells for
> some subjects, then, actually, estimating the variance of fixed effects
> between subjects (just another word for by-subject random slopes) becomes
> partially the same as measuring the fixed effect itself"
> Woundn't this be only true if for some cells there is no data for any
> subject? In tat case, yes, there is no way of estimating the variance of
> the corresponding random effects, and therefore it would be equivalent to
> estimating the fixed effect only. But here there are data for some of the
> subjects; the only random effects estimated as zero are those corresponding
> to the subjects with no data. Also, it is important to note that I get zero
> random effects only if I use a diagonal var-cov matrix. If I used, for
> example, compound symmetry, that is not the case.
>
> @David. That is right; the problem arises only when I introduce random
> slopes on mPair (and I should do that), which is a factor with 6 levels as
> you said. I am not interested in the 'cycle' variable, and therefore I am
> not using it for fixed nor random effects.
>
> Best
> Cristiano
>
> On Fri, Jun 14, 2019 at 4:09 AM René <bimonosom using gmail.com> wrote:
>
>> Ah now I think of the following:
>> Estimating by-subject random slopes necessarily requires that the random
>> slope (i.e. in all within-subject design cells) is measured on each
>> subject. If, as you say, there are measurements missing in some
>> design-cells for some subjects, then, actually, estimating the variance of
>> fixed effects between subjects (just another word for by-subject random
>> slopes) becomes partially the same as measuring the fixed effect itself,
>> which is 'bad'.  Furthermore, this might be similarly troubling when
>> estimating by-subject intercepts, but for a slightly different reason,
>> namely, (lets make it extreme) if half of the subjects have measures in all
>> design cells, while the other half has only measures in some-design cells,
>> then what would you expect how the intercepts are distributed (i.e. the
>> subjects average response deviation from the grand mean), if there are
>> systematic differences between the means in the design-cells? The a priori
>> answer is, "probably not Gaussian", which is again 'bad'  :))
>>
>> I would suggest to adjust the model-definition to reflect the fact that
>> there are cell-measurements missing for some subjects (regardless of
>> whether a model converges or not, but just because, this would be the only
>> way to meaningfully interpret the model).
>> I think this should work:
>> Let's take the model from the last link you posted
>>
>> cc_marg ~ mPair*spd_des + diag(mPair:spd_des|ratID)
>>
>> Define a (-numeric-) variable (say "cell_exists") in the data frame which
>> codes whether a subject (for all observations by that subject) has
>> measurements in all cells (coded as 1), or not (coded as 0), such that all
>> subjects of which you speak have missing data in some cells are 0.
>> Then:
>>
>> cc_marg ~ mPair*spd_des + diag(0+cell_exists:mPair:spd_des|ratID)
>>
>> Will estimate (no intercepts and) only random slopes for subjects with
>> cell_exists=1
>> And to achieve the same for intercept (lets have a second variable which
>> is identically coded as cell_exists to be as clear as possible:
>> cell_exists_intercept)
>>
>> cc_marg ~ mPair*spd_des + diag(0+ cell_exists_intercept +cell_exists:mPair:spd_des|ratID)
>>
>> And the intercept then would be the "cell_exists_intercept".
>> This should deal with the missing stuff :)
>> But don't ask me how to call the random effects in the end :)) (random
>> slopes for a sub-sample of subjects maybe), or the residuals (mixture
>> between individual level model errors, and random intercept and slope
>> variance for those subjects with incomplete data).
>>
>> Hope this helps (I guess there will be a solution eventually, there is
>> not much left to do, except going Bayesian :))
>> Best, René
>>
>>
>>
>> Am Fr., 14. Juni 2019 um 03:36 Uhr schrieb David Duffy <
>> David.Duffy using qimrberghofer.edu.au>:
>>
>>> FWIW, on my machine,
>>>
>>> lmer(cc_marg ~  mPair*spd_des + (1|cycle) + (1|ratID), data=dat)
>>>
>>> runs without complaint. It's only when I add in mPair as fixed and
>>> random that I get problems. I notice that cycle has a *lot* of levels,and
>>> the distribution of cc_marg is pretty skewed. I always have trouble
>>> understanding measurement models in a lmer formula - mPair are six
>>> different measures, is that right? If that is the case, you might
>>> cross-check your results by running in MCMCglmm as an explicit multivariate
>>> model, and getting the same answers.
>>>
>>> Cheers, David Duffy.
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>

	[[alternative HTML version deleted]]