[R-sig-ME] Questions about design and convergence warnings
Dube, Umber
udube at wustl.edu
Wed Oct 4 19:12:06 CEST 2017
Thanks.
---
David,
I only have 3 species after dropping the one for which I had incomplete data. Adding the term species:tissue doesn't seem to impact my results .
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: NP.1 ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species + Tissue +
(1 | batch) + (1 | Tissue:Organ) + Gene_scale
Data: regress_MSBB
Control: glmerControl(optCtrl = list(maxfun = 1e+09))
AIC BIC logLik deviance df.resid
741.9 800.4 -357.9 715.9 653
Scaled residuals:
Min 1Q Median 3Q Max
-3.2476 -0.7030 0.3109 0.6412 3.5852
Random effects:
Groups Name Variance Std.Dev.
Tissue:Organ (Intercept) 0.0001335 0.01155
batch (Intercept) 0.3555065 0.59624
Number of obs: 666, groups: Tissue:Organ, 666; batch, 28
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3087 0.3000 1.029 0.30348
RIN_scale -0.3812 0.1290 -2.956 0.00312 **
SEXF 0.4868 0.2030 2.398 0.01650 *
AOD_scale -0.1094 0.1039 -1.053 0.29217
PMI_scale -0.5463 0.1081 -5.055 4.31e-07 ***
Species1 -0.1406 0.2891 -0.486 0.62680
Species2 -1.5740 0.3873 -4.064 4.83e-05 ***
Tissue1 -0.2696 0.3788 -0.712 0.47660
Tissue2 -0.1634 0.3661 -0.446 0.65528
Tissue3 0.6631 0.4109 1.614 0.10661
Gene_scale -0.8184 0.1250 -6.548 5.85e-11 ***
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: NP.1 ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species + Tissue +
(1 | batch) + (1 | Tissue:Organ) + Species:Tissue + Gene_scale
Data: regress_MSBB
Control: glmerControl(optCtrl = list(maxfun = 1e+09))
AIC BIC logLik deviance df.resid
751.7 837.2 -356.8 713.7 647
Scaled residuals:
Min 1Q Median 3Q Max
-3.2554 -0.7252 0.3088 0.6355 3.6163
Random effects:
Groups Name Variance Std.Dev.
Tissue:Organ (Intercept) 0.0005321 0.02307
batch (Intercept) 0.3613201 0.60110
Number of obs: 666, groups: Tissue:Organ, 666; batch, 28
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.29375 0.31057 0.946 0.34423
RIN_scale -0.38955 0.13092 -2.975 0.00293 **
SEXF 0.48576 0.20366 2.385 0.01707 *
AOD_scale -0.10533 0.10418 -1.011 0.31199
PMI_scale -0.54720 0.10877 -5.031 4.88e-07 ***
Species1 -0.09427 0.55332 -0.170 0.86472
Species2 -1.35744 0.68746 -1.975 0.04832 *
Tissue1 -0.32254 0.40243 -0.801 0.42286
Tissue2 -0.14617 0.39294 -0.372 0.70989
Tissue3 0.76621 0.43232 1.772 0.07634 .
Gene_scale -0.81542 0.12527 -6.509 7.56e-11 ***
Species1:Tissue1 0.17224 0.80081 0.215 0.82971
Species2:Tissue1 0.42202 1.00304 0.421 0.67395
Species1:Tissue2 0.02979 0.78221 0.038 0.96962
Species2:Tissue2 -0.49815 1.08503 -0.459 0.64616
Species1:Tissue3 -0.39867 0.80706 -0.494 0.62132
Species2:Tissue3 -1.03753 1.16070 -0.894 0.37139
---
All,
1) I have been trying to use a very similar model in a lmer for a continuous variable, weight of the organ as a proxy for disease. However, I get an error when I try and run the following model:
lmer(Weight ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species+ Tissue + (1|batch) + (1|Tissue:Organ) + Gene_scale, data=regress_MSBB, family=gaussian(), control=lmerControl(optCtrl=list(maxfun=1e9) ) )
Error: number of levels of each grouping factor must be < number of observations
This error goes away when I drop the (1|Tissue:Organ) term, but I'm concerned that I am no longer faithfully modeling the random effects in doing so.
Does anyone have advice with how to deal with this error?
2) Should I be including Gene expression in front of the | in my random effect terms if I believe that the change in gene expression between disease and healthy are different in the different tissues? If so, should I be including a term for (GeneExpression|Tissue) in addition to Tissue + (1|Tissue:Organ)? I know I can't use (GeneExpression|Tissue:Organ) as this produces the following error: Error: number of observations (=666) < number of random effects (=1332).
Thanks,
Umber
________________________________
From: Farrar, David <Farrar.David at epa.gov>
Sent: Tuesday, October 3, 2017 5:04:42 PM
To: Ben Bolker; Dube, Umber
Cc: r-sig-mixed-models at r-project.org; thierry.onkelinx at inbo.be
Subject: RE: [R-sig-ME] Questions about design and convergence warnings
Hi,
I would like to check my intuition on the number of levels needed for a random effect. I think of this basically in terms of the split plot design, with fixed effects that vary either within whole plots or among whole plots. In the real world, I wouldn't expect that clean a breakdown. You need a bunch of levels if the variance is inherently interesting. If you are trying to evaluate some fixed effect, it seems like you are concerned with whether that effect varies primarily among levels of the random effect, or primarily within levels. In the 2nd case, it seems to me that you wouldn't need many levels for the random effect. This is just my intuition. I am interested in how good it is and whether it can be related to mathematical results.
Regards,
David
-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Tuesday, October 03, 2017 5:34 PM
To: Dube, Umber <udube at wustl.edu>
Cc: Farrar, David <Farrar.David at epa.gov>; r-sig-mixed-models at r-project.org; thierry.onkelinx at inbo.be
Subject: Re: [R-sig-ME] Questions about design and convergence warnings
On Tue, Oct 3, 2017 at 4:50 PM, Dube, Umber <udube at wustl.edu> wrote:
> Thanks for your responses.
>
> Thierry,
>
> The first case is correct. For example, the skin is an organ and the different layers of the skin (epidermis, dermis, and hypodermis) can be considered tissues. So in my experiment I have collected the same tissues originating from different organs which are either healthy or diseased.
>
> Good call on the species effect. I've dropped one of the species for whom I had only healthy data. This has resulted in fewer convergence warnings as I perform the glmer with each gene. I've also noticed that even those genes for which I do produce convergence warnings will converge if I try using other optimizers. I understand this provides additional evidence to support the warnings being false positives.
>
> ---
>
> David,
>
> I hadn't considered that. If I'm understanding correctly, you would recommend including species:tissue in my model?
>
> ---
>
> All,
>
> Is it appropriate to model batch as a random effect, or should I include it as a fixed effect?
Haven't followed the thread carefully, but: typically an exchangeable factor such as batch (i.e., relabeling the batches wouldn't change anything about their meaning) is _philosophically_ a random effect, but if you have too few levels (batches) to reliably estimate among-batch variance (e.g. <5) you might choose to treat it as a fixed effect instead (or use a Bayesian method that allows you to specify a prior on the among-batch variance ...)
>
>
> Thanks,
>
> Umber
>
>
> ________________________________
> From: Farrar, David <Farrar.David at epa.gov>
> Sent: Monday, October 2, 2017 1:21:45 PM
> To: Dube, Umber
> Subject: RE: [R-sig-ME] Questions about design and convergence
> warnings
>
> Dear Umber,
> It seems your model would not support a conclusion that the effect on a certain tissue would depend on the species, an interaction.
> Is that intentional?
> David
>
> -----Original Message-----
> From: R-sig-mixed-models
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thierry
> Onkelinx
> Sent: Friday, September 29, 2017 7:35 AM
> To: Dube, Umber <udube at wustl.edu>
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Questions about design and convergence
> warnings
>
> Dear Umber,
>
> Can you clarify what a tissue is? Distinct parts of an organ (tissue 1 for organ 1 refers to the same part as tissue 1 for organ 2)? Or merely different samples for the same organ (no link between tissue 1 between organs). Tissue as random effect is only relevant in the first case. In the latter case is depends on the number of replicates per tissue:organ. In case of one replication go for (1|Organ), in case of multiple replications go for (1|Organ/Tissue).
>
> It looks like you have very strong species effects. That is an indication for quasi-complete separation, which can trigger convergence warnings.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus/ Statistician
>
> Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN
> BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie
> & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel
> www.inbo.be<http://www.inbo.be>
>
> //////////////////////////////////////////////////////////////////////
> ///////////////////// To call in the statistician after the experiment
> is done may be no more than asking him to perform a post-mortem
> examination: he may be able to say what the experiment died of. ~ Sir
> Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger
> Brinner The combination of some data and an aching desire for an
> answer does not ensure that a reasonable answer can be extracted from
> a given body of data. ~ John Tukey
> //////////////////////////////////////////////////////////////////////
> /////////////////////
>
>
> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
> Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.
>
> //////////////////////////////////////////////////////////////////////
> /////////////////////
>
>
>
> 2017-09-28 16:58 GMT+02:00 Dube, Umber <udube at wustl.edu>:
>> Thanks for your continued development of lme4 and all the support you've provided.
>>
>> This is my first mixed model analysis. I've done my best to read over the past messages and think I've found a proper method for performing it, but I would like to verify that is correct.
>>
>> I'm interested in performing a generalized linear mixed model analysis on RNA-sequencing data from different tissues derived from the same organ (I have 4 different tissues from ~160 diseased organs, ~60 healthy organs).
>>
>> I have been modelling the following as fixed effects:
>> RNA Integrity Number (RIN) - quality of the total RNA extracted from
>> each tissues (continuous) Post-mortem Interval (PMI) - how much time
>> elapsed following death until the tissue was frozen (continuous) Sex
>> - genetic sex of the organ (categorical) Age at death (AOD) - age of
>> organ at death (continuous) Species - species of organ (categorical)
>> Gene - normalized count data of gene expression (continuous)
>>
>> I have been modelling the following as random effects:
>> Batch (categorical)
>>
>> I understand that I have a nested model Organ/Tissue, but after reading (https://stackoverflow.com/questions/19414336/using-glmer-for-nested-data), I modeled tissue as a fixed effect due to the small numbers (4 tissues).
>>
>> Altogether, my model is:
>>
>> glmer(Disease ~ RIN + SEX + AOD + PMI + Species + (1|batch) + Tissue
>> +
>> (1|Tissue:Organ) + Gene, data=NormDEGenes, family=binomial(),
>> control=glmerControl(optCtrl=list(maxfun=1e9) ) )
>>
>> I unfortunately get convergence warnings with these models, but after going through the ?convergence documentation I hope they are false positives.
>>
>> Warning messages:
>> 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> unable to evaluate scaled gradient
>> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> Model failed to converge: degenerate Hessian with 1 negative
>> eigenvalues
>>
>> To address this:
>>
>> 1) I've centered and scaled my continous predictors
>>
>> 2) I've checked for singularity #False
>>
>> 3) I've printed and compared with internal calculations
>>
>> 4) I've tried all available optimizers. I believe all of them have failed to converge, but they all end up with approximately the same log-liklihoods.
>>
>>
>>> ss$ fixef ## extract fixed effects
>> (Intercept) RIN_scale SEXF AOD_scale PMI_scale Species1 Species2 Species3 Tissue1 Tissue2 Tissue3 Gene
>> bobyqa 0.11463 -0.41005 0.409846 -0.07834 -0.65198 14.79972 13.30085 14.88501 -0.35383 -0.31039 0.666657 -2.42662
>> Nelder_Mead -0.10894 -0.41005 0.40988 -0.07834 -0.652 15.02298 13.52411 15.10837 -0.35371 -0.31033 0.666808 -2.42659
>> nlminbw 0.067 -0.41005 0.409846 -0.07834 -0.65198 14.84728 13.34841 14.93257 -0.35383 -0.3104 0.666653 -2.4266
>> optimx.L-BFGS-B 0.066515 -0.40992 0.40951 -0.07822 -0.65189 14.84686 13.34846 14.93227 -0.35369 -0.3102 0.666332 -2.42644
>> nloptwrap.NLOPT_LN_NELDERMEAD -0.19423 -0.40082 0.401142 -0.07745 -0.63582 14.74898 13.28662 14.83122 -0.3464 -0.30534 0.646181 -2.36802
>> nloptwrap.NLOPT_LN_BOBYQA -0.19423 -0.40082 0.401142 -0.07745 -0.63582 14.74898 13.28662 14.83122 -0.3464 -0.30534 0.646181 -2.36802
>>
>>
>>> ss$ llik ## log-likelihoods
>> bobyqa Nelder_Mead nlminbw optimx.L-BFGS-B nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
>> -327.896 -327.896 -327.896 -327.896 -327.933 -327.933
>>
>>> ss$ sdcor ## SDs and correlations
>> Organ:Tissue.(Intercept) batch.(Intercept)
>> bobyqa 3.68E-05 0.520826
>> Nelder_Mead 5.56E-03 0.52084
>> nlminbw 3.10E-08 0.520826
>> optimx.L-BFGS-B 0.00E+00 0.52084
>> nloptwrap.NLOPT_LN_NELDERMEAD 4.56E-08 0.517555
>> nloptwrap.NLOPT_LN_BOBYQA 4.56E-08 0.517555
>>
>>
>>
>>> ss$ theta ## Cholesky factors
>> Organ:Tissue.(Intercept) batch.(Intercept)
>> bobyqa 3.68E-05 0.520826
>> Nelder_Mead 5.56E-03 0.52084
>> nlminbw 3.10E-08 0.520826
>> optimx.L-BFGS-B 0.00E+00 0.52084
>> nloptwrap.NLOPT_LN_NELDERMEAD 4.56E-08 0.517555
>> nloptwrap.NLOPT_LN_BOBYQA 4.56E-08 0.517555
>>
>>
>>> ss$ which.OK ## which fits worked
>> bobyqa Nelder_Mead nlminbw nmkbw optimx.L-BFGS-B nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
>> TRUE TRUE TRUE FALSE TRUE TRUE TRUE
>>
>>
>> I would appreciate comment on my design, convergence warnings, and troubleshooting results.
>>
>> Thanks,
>>
>> Umber
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list