[R-sig-ME] Help: Interpreting unusual model fit results from generalized linear mixed model (glmmTMB/sjPlot) (Andre Syvertsen)
Andre Syvertsen
Andre@Syvert@en @end|ng |rom u|b@no
Fri Feb 26 12:35:11 CET 2021
Text version of table:
Active Gambling Days Monthly
Predictors Incidence Rate Ratios, CI, p
(Intercept): 1.18, 1.16 – 1.20, <0.001
Time : 0.99, 0.99 – 0.99, <0.001
Age Category 30-39: 1.29, 1.26 – 1.32, <0.001
Age Category 40-49: 1.81, 1.78 – 1.85, <0.001
Age Category 50-59: 2.47, 2.41 – 2.53, <0.001
Age Category 60-69: 3.08, 2.99 – 3.17, <0.001
Age Category 70+: 3.42, 3.29 – 3.56, <0.001
Gender-Women: 1.69, 1.65 – 1.74, <0.001
Age Category 30-39-Women: 0.90, 0.86 – 0.94, <0.001
Age Category 40-49-Women: 0.67, 0.65 – 0.70, <0.001
Age Category 50-59-Women: 0.53, 0.50 – 0.55, <0.001
Age Category 60-69- Women: 0.46, 0.44 – 0.48, <0.001
Age Category 70+-Women: 0.45, 0.43 – 0.48, <0.001
Random Effects
σ2 0.00
τ00 id 1.63
τ11 id.time 0.00
ρ01 id -0.38
ICC 1.00
N id 184113
Observations 3,231,544
Marginal R2 / Conditional R2 0.087 / 1.000
Note: Intercept = Men, age 18-29 years, first time point (month 0 of 69).
________________________________
Fra: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> på vegne av r-sig-mixed-models-request using r-project.org <r-sig-mixed-models-request using r-project.org>
Sendt: fredag 26. februar 2021 11:21
Til: r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org>
Emne: R-sig-mixed-models Digest, Vol 170, Issue 20
Send R-sig-mixed-models mailing list submissions to
r-sig-mixed-models using r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
or, via email, send a message with subject or body 'help' to
r-sig-mixed-models-request using r-project.org
You can reach the person managing the list at
r-sig-mixed-models-owner using r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-mixed-models digest..."
Today's Topics:
1. Residual Variance-Covariance matrix (Yashree Mehta)
2. Re: lmer code for multiple random slopes (Phillip Alday)
3. question regarding time as a continuous factor in a linear
mixed effects model (Laura Coco)
4. Help: Interpreting unusual model fit results from generalized
linear mixed model (glmmTMB/sjPlot) (Andre Syvertsen)
----------------------------------------------------------------------
Message: 1
Date: Thu, 25 Feb 2021 16:16:08 +0100
From: Yashree Mehta <yashree19 using gmail.com>
To: r-sig-mixed-models using r-project.org
Subject: [R-sig-ME] Residual Variance-Covariance matrix
Message-ID:
<CAOE=hqKTTHxSChkLvxuO1USoCtk+ZmBbKtr6kDhxu=Xa438pZA using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
I am using the following the code to extract the residual
variance-covariance matrix cov(Yi/Xi):
I first fit the model with the name lmemod_lme4. I have an unbalanced
cluster dataset.
Then, I extracted components with the following:
var.d <- crossprod(getME(lmemod_lme4,"Lambdat"))
Zt <- getME(lmemod_lme4,"Zt")
vr <- sigma(lmemod_lme4)^2
Then, I combine them with the following:
var.b <- t(Zt) %*% var.d %*% Zt
sI <- vr * Diagonal(nrow(Nameofdataset))
var.y <- var.b + sI
I have 2799 observations in my dataset. MY var.y matrix has dimension
2799 times 2799. Is there now a way to extract the
cov(Yi/Xi) for each observation? Also, I get a very large value for
the determinant of var.y.
I would be grateful to get further guidance on this.
Thank you very much.
Regards
[[alternative HTML version deleted]]
------------------------------
Message: 2
Date: Thu, 25 Feb 2021 16:26:25 +0100
From: Phillip Alday <me using phillipalday.com>
To: Peter R Law <prldb using protonmail.com>,
"r-sig-mixed-models using r-project.org" <r-sig-mixed-models using r-project.org>,
Ben Bolker <bbolker using gmail.com>
Subject: Re: [R-sig-ME] lmer code for multiple random slopes
Message-ID: <f286a096-f420-c246-61ad-542eaa5dad44 using phillipalday.com>
Content-Type: text/plain; charset="utf-8"
On 25/2/21 5:16 am, Peter R Law via R-sig-mixed-models wrote:
> Alas I am still puzzled. I have extracted some data in trial.txt (no categorical variables) and attached some code in trial.R. I am only using this data to test code, which I hope to apply to a larger data set obtained from multiple populations, so with more structure. So the results themselves are not important, only whether the code does what I think it should. It occurred to me to test each model with lme and lmer.
>
> For a model with only a random intercept plus fixed effects, lme and lmer returned the same results, except that lme returns estimates with more decimal places (so here lmer returned zero variance for the random intercpt and lme a very small number).
This is to be expected: lme can't handle singular models (e.g. models
with an exactly zero variance component), but lmer can. So lme just gets
as close as it can.
>
> Adding a random slope, the main difference in the results is that lme and lmer returned very different estimates of the slope variance. Is that surprising?
It depends on what's going wrong. Generally these should return
identical fits for converged models (with the fine print that there are
certain models that you can specify in one but not the other without a
lot of effort).
>
> For a model with two random slopes, lme returned results as expected but lmer still claims there are 78 variance components, which is the number one would get from a 12 x 12 covariance matrix. Where the number 12 comes from is a mystery to me, especially as lme does what I expected (I ran this model with both raw data and normalized predictor values to see if something fishy was happening there but no):
The list strips most attachment types, so your script didn't make it
through. But just your data was enough for me to find a few problems
First, you're fitting and displaying two different models here:
>
>> M5n <- lme(Response~normP+normA, random=~normP+normA|Group,data=Trial, method="ML")
>> summary(M2n)
Second, when I try to fit the model that's given by the summary here, I
get a numerical error in lme:
> lme(Response~P+A, random=~P+A|Group,data=Trial, method="ML")
Error in lme.formula(Response ~ P + A, random = ~P + A | Group, data =
Trial, :
nlminb problem, convergence error code = 1
message = false convergence (8)
This makes me think that something is wrong wrong in both bits of
software, but maybe the versions on your local machine are catching it
> Linear mixed-effects model fit by maximum likelihood
> Data: Trial
> AIC BIC logLik
> 536.4562 559.4969 -258.2281
>
> Random effects:
> Formula: ~P + A | Group
> Structure: General positive-definite, Log-Cholesky parametrization
> StdDev Corr
> (Intercept) 3.061144e-04 (Intr) P
> P 1.535752e-09 0
> A 8.517774e-13 0 0
> Residual 7.929821e+00
>
> Fixed effects: Response ~ P + A
> Value Std.Error DF t-value p-value
> (Intercept) 25.453398 10.828841 46 2.3505192 0.0231
> P -0.029307 0.035939 46 -0.8154787 0.4190
> A 0.140819 0.282018 46 0.4993255 0.6199
> Correlation:
> (Intr) P
> P -0.033
> A -0.975 -0.173
>
> Standardized Within-Group Residuals:
> Min Q1 Med Q3 Max
> -1.8447131 -0.5563605 -0.2613520 0.2755547 5.5643551
>
> Number of Observations: 74
> Number of Groups: 26
>> M5 <- lmer(Response~normP+normA+(normP+normA|Group), REML="False", data=Trial)
> Error: number of observations (=74) <= number of random effects (=78) for term (normP + normA | Group); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable.
This screams that your normP and normA variables aren't being handled as
continuous but rather categorical/factors. This means that you have a
lot more slopes being estimated, which the model survives, but not the
extra correlation parameters (hence the success when you suppress them
with the two syntax variants you mention below).
> It didn't matter which pair of the three predictors I used as far as the error message lmer returned but lme only returned results for P+A, not any pair involving PR, which apparently created computational issues, distinct to the interpretation of the code.
>
> In lmer, replacing the code (x+y|Group) by either (x+y||Group) or (X|Group) + (Y|Group) returned the expected results (in that lme4 interpreted the code as expected). Is there lme code for either of these models?
>
> Many thanks for any help.
>
> Peter
>
> P. S. I just noticed that responding to Ben's email sends my email to Ben, not to r-sig. I hope that sending my email to r-sig is the right thing to do and doesn't break the chain to my previous email.
Yep! Frequently, we have the reverse problem: people forget to keep the
list in CC. So thanks! The matching in the threading is handled mostly
by the subject line.
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, February 17, 2021 10:40 PM, Peter R Law <prldb using protonmail.com> wrote:
>
>> Thanks for your quick response. I take it that the code should mean what I thought it would and that somehow lmer is not interpreting what I wrote in my actual example as intended. None of the variables are categorical but I'll give it some more thought and see if I can figure it out. If not, I'll provide more details.
>>
>> Peter
>>
>> Sent with ProtonMail Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Tuesday, February 16, 2021 10:04 AM, Ben Bolker bbolker using gmail.com wrote:
>>
>>> I second Phillip's point. The example below works as expected (gets
>>> a singular fit, but there are 6 covariance parameters as expected).
>>> Based on what you've told us so far, the most plausible explanation is
>>> that one or both of your covariates (x and/or z) are factors
>>> (categorical) rather than numeric.
>>> Ben Bolker
>>> ============================================================================================================================================================================================================================================================================================================================
>>> set.seed(101)
>>> dd <- data.frame(x=rnorm(500),z=rnorm(500),
>>> g=factor(sample(1:6,size=500,replace=TRUE)))
>>> form <- y ~ x + z + (x+z|g)
>>> dd$y <- simulate(form[-2],
>>> newdata=dd,
>>> newparams=list(beta=rep(0,3),
>>> theta=rep(1,6),
>>> sigma=1))[[1]]
>>> library(lme4)
>>> m1 <- lmer(form, data=dd)
>>> VarCorr(m1)
>>> On 2/16/21 8:18 AM, Phillip Alday wrote:
>>>
>>>> I suspect we'll need to know a bit more about your data to answer this
>>>> question. Can you share it in any form (e.g. variables renamed and
>>>> levels of factors changed to something opaque) ?
>>>> Best,
>>>> Phillip
>>>> On 16/2/21 4:02 am, Peter R Law via R-sig-mixed-models wrote:
>>>>
>>>>> I am trying to fit a model with two covariates, x and z say, for response y, with a random factor g and want each of x and y to have a random slope. I expected
>>>>> lmer(y ~ x + z + (x+z|g),...)
>>>>> to fit a model with 6 random variance components, the intercept, two slopes and three correlations. But I got an error message saying there were 74 random variance components and my data was insufficient to fit the model. Yet
>>>>> lmer(y ~ x + z + (x+z||g),...)
>>>>> returned what I expected, a model with the random intercept and two slopes but no correlations. How is lmer interpreting the first line of code above and how I would code for what I want. I have not been able to find any examples in the literature or online that help me but I may have easily missed something so if anyone knows of a useful link that'd be great. The only examples of multiple random slopes I've seen take the form
>>>>> lmer(y~x + z +(x|g) + (z|g),...)
>>>>> specifically excluding correlations between the random slopes and intercept of the two predictors. Even if the latter is a more sensible approach I'd like to understand the coding issue.
>>>>> Thanks.
>>>>> Peter
>>>>> Sent with ProtonMail Secure Email.
>>>>> [[alternative HTML version deleted]]
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
------------------------------
Message: 3
Date: Thu, 25 Feb 2021 10:36:28 -0700
From: Laura Coco <lauracoco using gmail.com>
To: r-sig-mixed-models using r-project.org
Subject: [R-sig-ME] question regarding time as a continuous factor in
a linear mixed effects model
Message-ID:
<CAKgKnP6Nx9Ee=vhPd6JtbUrzw4iyeXjh-=FZo9zCTo5RtcTjkQ using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hello,
I am interested in investigating the main effects of group, time, and group
by time interaction on survey outcomes using linear mixed effects models.
Time is considered as continuous (number of days since baseline), but isn't
it also categorical, since I want to compare Session 1 vs Session 4 (for
example)? How is that handled in the model? As of now, time (days since
baseline) is being treated as one unit, rather than four separate sessions.
Here is an example of my code: mdl.outcome <- lmer(outcome ~ time*Group +
(1 | PID), data = dta)
Thank you!!
[[alternative HTML version deleted]]
------------------------------
Message: 4
Date: Fri, 26 Feb 2021 10:20:18 +0000
From: Andre Syvertsen <Andre.Syvertsen using uib.no>
To: "r-sig-mixed-models using r-project.org"
<r-sig-mixed-models using r-project.org>
Subject: [R-sig-ME] Help: Interpreting unusual model fit results from
generalized linear mixed model (glmmTMB/sjPlot)
Message-ID:
<AM6PR0102MB31748BFF6EAF4FEA67D042809E9D9 using AM6PR0102MB3174.eurprd01.prod.exchangelabs.com>
Content-Type: text/plain; charset="utf-8"
Hi,
I am working with a large dataset that contains longitudinal data on gambling behavior of 184,113 participants. The data is based on complete tracking of electronic gambling behavior within a gambling operator. Gambling behavior data is aggregated on a monthly level, a total of 70 months. I have an ID variable separating participants, a time variable (months), as well as numerous gambling behavior variables such as active days played for given month, bets placed for given month, total losses for given month, etc. I am investigating the role of age and gender in predicting active days gambling per month.
I have fitted a model with glmmTMB (see below for model code) and outputed the resulting statistics with sjPlot's tab_model function which I am having trouble interpreting. The full results can be found below. Notably, I appear to have gotten perfect intra-class correlation. While, I am sure variance in outcome responses (active days gambling) are likely to be heavily associated with subject and time, this seems excessive. Furthermore, the pseudo R2 suggests that 8.7% of the variance should be attributable to fixed effects which I would think would lower the variance attributable to individual/time? Are the results affected by the high number of observations, individuals and/or time points? Or maybe I have specified my model in an odd manner?
glmmTMB code for the model:
DaysPlayedConditionalAgeGenderTruncated <- glmmTMB(daysPlayed ~ 1 + time + ageCategory * gender + (time | id), dfLong, family = truncated_nbinom2)
Model summary:
Active Gambling Days Monthly
Predictors
Incidence Rate Ratios
CI
p
(Intercept)
1.18
1.16 �C 1.20
<0.001
Time
0.99
0.99 �C 0.99
<0.001
Age Category 30-39
1.29
1.26 �C 1.32
<0.001
Age Category 40-49
1.81
1.78 �C 1.85
<0.001
Age Category 50-59
2.47
2.41 �C 2.53
<0.001
Age Category 60-69
3.08
2.99 �C 3.17
<0.001
Age Category 70+
3.42
3.29 �C 3.56
<0.001
Gender: Women
1.69
1.65 �C 1.74
<0.001
Age Category 30-39:Women
0.90
0.86 �C 0.94
<0.001
Age Category 40-49:Women
0.67
0.65 �C 0.70
<0.001
Age Category 50-59:Women
0.53
0.50 �C 0.55
<0.001
Age Category 60-69: Women
0.46
0.44 �C 0.48
<0.001
Age Category 70+:Women
0.45
0.43 �C 0.48
<0.001
Random Effects
��2
0.00
��00 id
1.63
��11 id.time
0.00
��01 id
-0.38
ICC
1.00
N id
184113
Observations
3231544
Marginal R2 / Conditional R2
0.087 / 1.000
Note. Intercept = Men, age 18-29 years, first time point (month 0 of 69).
Kind regards,
Andr��
[[alternative HTML version deleted]]
------------------------------
Subject: Digest Footer
_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models using r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
------------------------------
End of R-sig-mixed-models Digest, Vol 170, Issue 20
***************************************************
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list