[R-sig-ME] Questions about design and convergence warnings

Dube, Umber udube at wustl.edu
Wed Oct 4 19:12:06 CEST 2017


Thanks.


---


David,


I only have 3 species after dropping the one for which I had incomplete data. Adding the term species:tissue doesn't seem to impact my results .



Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: NP.1 ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species + Tissue +
    (1 | batch) + (1 | Tissue:Organ) + Gene_scale
   Data: regress_MSBB
Control: glmerControl(optCtrl = list(maxfun = 1e+09))

     AIC      BIC   logLik deviance df.resid
   741.9    800.4   -357.9    715.9      653

Scaled residuals:
    Min      1Q  Median      3Q     Max
-3.2476 -0.7030  0.3109  0.6412  3.5852

Random effects:
 Groups                            Name        Variance  Std.Dev.
 Tissue:Organ (Intercept) 0.0001335 0.01155
 batch                             (Intercept) 0.3555065 0.59624
Number of obs: 666, groups:  Tissue:Organ, 666; batch, 28

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)        0.3087     0.3000   1.029  0.30348
RIN_scale         -0.3812     0.1290  -2.956  0.00312 **
SEXF               0.4868     0.2030   2.398  0.01650 *
AOD_scale         -0.1094     0.1039  -1.053  0.29217
PMI_scale         -0.5463     0.1081  -5.055 4.31e-07 ***
Species1             -0.1406     0.2891  -0.486  0.62680
Species2             -1.5740     0.3873  -4.064 4.83e-05 ***
Tissue1  -0.2696     0.3788  -0.712  0.47660
Tissue2  -0.1634     0.3661  -0.446  0.65528
Tissue3   0.6631     0.4109   1.614  0.10661
Gene_scale      -0.8184     0.1250  -6.548 5.85e-11 ***




Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: NP.1 ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species + Tissue +
    (1 | batch) + (1 | Tissue:Organ) + Species:Tissue +      Gene_scale
   Data: regress_MSBB
Control: glmerControl(optCtrl = list(maxfun = 1e+09))

     AIC      BIC   logLik deviance df.resid
   751.7    837.2   -356.8    713.7      647

Scaled residuals:
    Min      1Q  Median      3Q     Max
-3.2554 -0.7252  0.3088  0.6355  3.6163

Random effects:
 Groups                            Name        Variance  Std.Dev.
 Tissue:Organ (Intercept) 0.0005321 0.02307
 batch                             (Intercept) 0.3613201 0.60110
Number of obs: 666, groups:  Tissue:Organ, 666; batch, 28

Fixed effects:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)             0.29375    0.31057   0.946  0.34423
RIN_scale              -0.38955    0.13092  -2.975  0.00293 **
SEXF                    0.48576    0.20366   2.385  0.01707 *
AOD_scale              -0.10533    0.10418  -1.011  0.31199
PMI_scale              -0.54720    0.10877  -5.031 4.88e-07 ***
Species1                  -0.09427    0.55332  -0.170  0.86472
Species2                  -1.35744    0.68746  -1.975  0.04832 *
Tissue1       -0.32254    0.40243  -0.801  0.42286
Tissue2       -0.14617    0.39294  -0.372  0.70989
Tissue3        0.76621    0.43232   1.772  0.07634 .
Gene_scale           -0.81542    0.12527  -6.509 7.56e-11 ***
Species1:Tissue1  0.17224    0.80081   0.215  0.82971
Species2:Tissue1  0.42202    1.00304   0.421  0.67395
Species1:Tissue2  0.02979    0.78221   0.038  0.96962
Species2:Tissue2 -0.49815    1.08503  -0.459  0.64616
Species1:Tissue3 -0.39867    0.80706  -0.494  0.62132
Species2:Tissue3 -1.03753    1.16070  -0.894  0.37139



---


All,


1) I have been trying to use a very similar model in a lmer for a continuous variable, weight of the organ as a proxy for disease. However, I get an error when I try and run the following model:
lmer(Weight ~ RIN_scale + SEX + AOD_scale + PMI_scale + Species+ Tissue + (1|batch) + (1|Tissue:Organ) + Gene_scale, data=regress_MSBB, family=gaussian(), control=lmerControl(optCtrl=list(maxfun=1e9) ) )


Error: number of levels of each grouping factor must be < number of observations


This error goes away when I drop the (1|Tissue:Organ) term, but I'm concerned that I am no longer faithfully modeling the random effects in doing so.


Does anyone have advice with how to deal with this error?


2) Should I be including Gene expression in front of the | in my random effect terms if I believe that the change in gene expression between disease and healthy are different in the different tissues? If so, should I be including a term for (GeneExpression|Tissue) in addition to Tissue + (1|Tissue:Organ)? I know I can't use (GeneExpression|Tissue:Organ) as this produces the following error: Error: number of observations (=666) < number of random effects (=1332).



Thanks,


Umber

________________________________
From: Farrar, David <Farrar.David at epa.gov>
Sent: Tuesday, October 3, 2017 5:04:42 PM
To: Ben Bolker; Dube, Umber
Cc: r-sig-mixed-models at r-project.org; thierry.onkelinx at inbo.be
Subject: RE: [R-sig-ME] Questions about design and convergence warnings

Hi,
I would like to check my intuition on the number of levels needed for a random effect.  I think of this basically in terms of the split plot design, with fixed effects that vary either within whole plots or among whole plots.  In the real world, I wouldn't expect that clean a breakdown.  You need a bunch of levels if the variance is inherently interesting.   If you are trying to evaluate some fixed effect, it seems like you are concerned with whether that effect varies primarily among levels of the random effect, or primarily within levels.  In the 2nd case, it seems to me that you wouldn't need many levels for the random effect.   This is just my intuition.   I am interested in how good it is and whether it can be related to mathematical results.
Regards,
David


-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Tuesday, October 03, 2017 5:34 PM
To: Dube, Umber <udube at wustl.edu>
Cc: Farrar, David <Farrar.David at epa.gov>; r-sig-mixed-models at r-project.org; thierry.onkelinx at inbo.be
Subject: Re: [R-sig-ME] Questions about design and convergence warnings

On Tue, Oct 3, 2017 at 4:50 PM, Dube, Umber <udube at wustl.edu> wrote:
> Thanks for your responses.
>
> Thierry,
>
> The first case is correct. For example, the skin is an organ and the different layers of the skin (epidermis, dermis, and hypodermis) can be considered tissues. So in my experiment I have collected the same tissues originating from different organs which are either healthy or diseased.
>
> Good call on the species effect. I've dropped one of the species for whom I had only healthy data. This has resulted in fewer convergence warnings as I perform the glmer with each gene. I've also noticed that even those genes for which I do produce convergence warnings will converge if I try using other optimizers. I understand this provides additional evidence to support the warnings being false positives.
>
> ---
>
> David,
>
> I hadn't considered that. If I'm understanding correctly, you would recommend including species:tissue in my model?
>
> ---
>
> All,
>
> Is it appropriate to model batch as a random effect, or should I include it as a fixed effect?


  Haven't followed the thread carefully, but: typically an exchangeable factor such as batch (i.e., relabeling the batches wouldn't change anything about their meaning) is _philosophically_ a random effect, but if you have too few levels (batches) to reliably estimate among-batch variance (e.g. <5) you might choose to treat it as a fixed effect instead (or use a Bayesian method that allows you to specify a prior on the among-batch variance ...)

>
>
> Thanks,
>
> Umber
>
>
> ________________________________
> From: Farrar, David <Farrar.David at epa.gov>
> Sent: Monday, October 2, 2017 1:21:45 PM
> To: Dube, Umber
> Subject: RE: [R-sig-ME] Questions about design and convergence
> warnings
>
> Dear Umber,
> It seems your model would not support a conclusion that the effect on a certain tissue would depend on the species, an interaction.
> Is that intentional?
> David
>
> -----Original Message-----
> From: R-sig-mixed-models
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Thierry
> Onkelinx
> Sent: Friday, September 29, 2017 7:35 AM
> To: Dube, Umber <udube at wustl.edu>
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Questions about design and convergence
> warnings
>
> Dear Umber,
>
> Can you clarify what a tissue is? Distinct parts of an organ (tissue 1 for organ 1 refers to the same part as tissue 1 for organ 2)? Or merely different samples for the same organ (no link between tissue 1 between organs). Tissue as random effect is only relevant in the first case. In the latter case is depends on the number of replicates per tissue:organ. In case of one replication go for (1|Organ), in case of multiple replications go for (1|Organ/Tissue).
>
> It looks like you have very strong species effects. That is an indication for quasi-complete separation, which can trigger convergence warnings.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus/ Statistician
>
> Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN
> BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie
> & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel
> www.inbo.be<http://www.inbo.be>
>
> //////////////////////////////////////////////////////////////////////
> ///////////////////// To call in the statistician after the experiment
> is done may be no more than asking him to perform a post-mortem
> examination: he may be able to say what the experiment died of. ~ Sir
> Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger
> Brinner The combination of some data and an aching desire for an
> answer does not ensure that a reasonable answer can be extracted from
> a given body of data. ~ John Tukey
> //////////////////////////////////////////////////////////////////////
> /////////////////////
>
>
> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
> Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.
>
> //////////////////////////////////////////////////////////////////////
> /////////////////////
>
>
>
> 2017-09-28 16:58 GMT+02:00 Dube, Umber <udube at wustl.edu>:
>> Thanks for your continued development of lme4 and all the support you've provided.
>>
>> This is my first mixed model analysis. I've done my best to read over the past messages and think I've found a proper method for performing it, but I would like to verify that is correct.
>>
>> I'm interested in performing a generalized linear mixed model analysis on RNA-sequencing data from different tissues derived from the same organ (I have 4 different tissues from ~160 diseased organs, ~60 healthy organs).
>>
>> I have been modelling the following as fixed effects:
>> RNA Integrity Number (RIN) - quality of the total RNA extracted from
>> each tissues (continuous) Post-mortem Interval (PMI) - how much time
>> elapsed following death until the tissue was frozen (continuous) Sex
>> - genetic sex of the organ (categorical) Age at death (AOD) - age of
>> organ at death (continuous) Species - species of organ (categorical)
>> Gene -  normalized count data of gene expression (continuous)
>>
>> I have been modelling the following as random effects:
>> Batch (categorical)
>>
>> I understand that I have a nested model Organ/Tissue, but after reading (https://stackoverflow.com/questions/19414336/using-glmer-for-nested-data), I modeled tissue as a fixed effect due to the small numbers (4 tissues).
>>
>> Altogether, my model is:
>>
>> glmer(Disease ~ RIN + SEX + AOD + PMI + Species + (1|batch) + Tissue
>> +
>> (1|Tissue:Organ) + Gene, data=NormDEGenes, family=binomial(),
>> control=glmerControl(optCtrl=list(maxfun=1e9) ) )
>>
>> I unfortunately get convergence warnings with these models, but after going through the ?convergence documentation I hope they are false positives.
>>
>> Warning messages:
>> 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   unable to evaluate scaled gradient
>> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   Model failed to converge: degenerate  Hessian with 1 negative
>> eigenvalues
>>
>> To address this:
>>
>> 1) I've centered and scaled my continous predictors
>>
>> 2) I've checked for singularity #False
>>
>> 3) I've printed and compared with internal calculations
>>
>> 4) I've tried all available optimizers. I believe all of them have failed to converge, but they all end up with approximately the same log-liklihoods.
>>
>>
>>> ss$ fixef ## extract fixed effects
>>         (Intercept)     RIN_scale       SEXF    AOD_scale       PMI_scale       Species1        Species2        Species3        Tissue1 Tissue2 Tissue3 Gene
>> bobyqa  0.11463   -0.41005      0.409846        -0.07834        -0.65198        14.79972        13.30085        14.88501        -0.35383        -0.31039        0.666657        -2.42662
>> Nelder_Mead     -0.10894        -0.41005        0.40988 -0.07834        -0.652  15.02298        13.52411        15.10837        -0.35371        -0.31033        0.666808        -2.42659
>> nlminbw 0.067   -0.41005        0.409846        -0.07834        -0.65198        14.84728        13.34841        14.93257        -0.35383        -0.3104 0.666653        -2.4266
>> optimx.L-BFGS-B 0.066515        -0.40992        0.40951 -0.07822        -0.65189        14.84686        13.34846        14.93227        -0.35369        -0.3102 0.666332        -2.42644
>> nloptwrap.NLOPT_LN_NELDERMEAD   -0.19423        -0.40082        0.401142        -0.07745        -0.63582        14.74898        13.28662        14.83122        -0.3464 -0.30534        0.646181        -2.36802
>> nloptwrap.NLOPT_LN_BOBYQA       -0.19423        -0.40082         0.401142       -0.07745        -0.63582        14.74898        13.28662        14.83122        -0.3464 -0.30534        0.646181        -2.36802
>>
>>
>>> ss$ llik ## log-likelihoods
>>         bobyqa  Nelder_Mead     nlminbw optimx.L-BFGS-B nloptwrap.NLOPT_LN_NELDERMEAD   nloptwrap.NLOPT_LN_BOBYQA
>>         -327.896        -327.896        -327.896        -327.896        -327.933        -327.933
>>
>>> ss$ sdcor ## SDs and correlations
>>                 Organ:Tissue.(Intercept)        batch.(Intercept)
>>         bobyqa  3.68E-05        0.520826
>>         Nelder_Mead     5.56E-03        0.52084
>>         nlminbw 3.10E-08        0.520826
>>         optimx.L-BFGS-B 0.00E+00        0.52084
>>         nloptwrap.NLOPT_LN_NELDERMEAD   4.56E-08         0.517555
>>         nloptwrap.NLOPT_LN_BOBYQA       4.56E-08        0.517555
>>
>>
>>
>>> ss$ theta ## Cholesky factors
>>                 Organ:Tissue.(Intercept)        batch.(Intercept)
>>         bobyqa  3.68E-05        0.520826
>>         Nelder_Mead     5.56E-03        0.52084
>>         nlminbw 3.10E-08        0.520826
>>         optimx.L-BFGS-B 0.00E+00        0.52084
>>         nloptwrap.NLOPT_LN_NELDERMEAD   4.56E-08        0.517555
>>         nloptwrap.NLOPT_LN_BOBYQA       4.56E-08        0.517555
>>
>>
>>> ss$ which.OK ## which fits worked
>>         bobyqa   Nelder_Mead     nlminbw        nmkbw   optimx.L-BFGS-B  nloptwrap.NLOPT_LN_NELDERMEAD  nloptwrap.NLOPT_LN_BOBYQA
>>         TRUE    TRUE    TRUE    FALSE   TRUE    TRUE    TRUE
>>
>>
>> I would appreciate comment on my design, convergence warnings, and troubleshooting results.
>>
>> Thanks,
>>
>> Umber
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list