[R-sig-ME] Anova II table, df, drop1 and very complex regression models!

Ben Bolker bbolker at gmail.com
Tue Aug 30 20:15:21 CEST 2016


  Can you recommend a convenient full-information ML estimation method
that handles a wide range of cases (hopefully nested+crossed, GLMM +
LMM, etc, hopefully implemented in R) ?

  Ben Bolker


On 16-08-30 08:42 AM, Steven J. Pierce wrote:
> I completely agree that analyzing a stable dataset is essential, but
> I'd argue that selecting a subset containing only cases with no
> missing data is actually changing the nature of the population to
> which your inferences can be generalized. In effect it tacks "and who
> have no missing data" onto the definition of the target population.
> That is unlikely to be the population you really want to study. It
> may also dramatically reduce your sample size, thereby reducing power
> and precision of the estimates.
> 
> To preserve inference to the original intended population, solve the
> missing data problem with either some form of imputation or by using
> a full-information maximum likelihood estimation method that doesn’t
> throw out cases with missing data.
> 
> 
> Steven J. Pierce, Ph.D. Associate Director Center for Statistical
> Training & Consulting (CSTAT) Michigan State University
> 
> -----Original Message----- From: Thierry Onkelinx
> [mailto:thierry.onkelinx at inbo.be] Sent: Tuesday, August 30, 2016 3:30
> AM To: Shadiya Al Hashmi <saah500 at york.ac.uk> Cc:
> r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Anova II
> table, df, drop1 and very complex regression models!
> 
> Dear Shadiya,
> 
> You need to do the model selection on a stable dataset. Therefore
> you should create a subset which doesn't contain missing values in
> the covariates. Use this subset for model selection. Then you can
> refit the final model on the total dataset.
> 
> Best regards,
> 
> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> Research Institute for Nature and Forest team Biometrie &
> Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
> 25 1070 Anderlecht Belgium
> 
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher 
> The plural of anecdote is not data. ~ Roger Brinner The combination
> of some data and an aching desire for an answer does not ensure that
> a reasonable answer can be extracted from a given body of data. ~
> John Tukey
> 
> 2016-08-30 7:23 GMT+02:00 Shadiya Al Hashmi <saah500 at york.ac.uk>:
> 
>> Good morning,
>> 
>> 
>> I have complex data of 7 variables (6 treatment + 1 control [“age”
>> in the model below]) plus 18 interactions in a dataset of 2448
>> observations which have some missing values (NAs).  The maximal
>> model which I have simplified as per y hypothesis is as follows.
>> 
>> 
>> 
>> modelAAW<-glmer(match~Listgp + vowel.quality +
>> stimulus.presentation + context +length + age + freq.+
>> Listgp:context+ Listgp:length+ Listgp:freq.+ 
>> Listgp:stimulus.presentation+ Listgp:age+ context:length+
>> context:freq.+ context:stimulus.presentation+ context:age+
>> length:freq.+ length:stimulus.presentation+ length:age+
>> freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+
>> Listgp:stimulus.presentation + Listgp:vowel.quality +
>> stimulus.presentation:vowel.quality + (Listgp|stimulus) +
>> (stimulus.presentation+vowel.quality|listener) , data = SBAAW,
>> family = "binomial", control=glmerControl(optCtrl= 
>> list(maxfun=2e5)), nAGQ =1)
>> 
>> 
>> 
>> I ran a binomial logistic regression analysis on the data and did
>> the stepwise regression manually since the drop1(modelAAW, test =
>> "Chisq") command yielded no results in a span of more than 16
>> hours. The resulting regression models are nested in the maximal
>> model (modelAAW).
>> 
>> 
>> 
>> Then, I reached the model selection step where I have to interpret
>> the anova table below which has degrees of freedom of zero for some
>> models.
>> 
>> 
>> 
>> Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
>> 
>> AAWXI    49 2454.5 2738.8 -1178.2   2356.5
>> 
>> AAWXII   49 2456.4 2740.8 -1179.2   2358.4 0.0000      0
>> 1.0000
>> 
>> AAWXIII  49 2456.4 2740.8 -1179.2   2358.4 0.0000      0
>> 1.0000
>> 
>> AAWIX    51 2457.6 2753.5 -1177.8   2355.6 2.8749      2
>> 0.2375
>> 
>> AAWX     51 2457.6 2753.5 -1177.8   2355.6 0.0000      0
>> 1.0000
>> 
>> AAWVI    52 2458.6 2760.4 -1177.3   2354.6 0.9410      1
>> 0.3320
>> 
>> AAWVII   52 2458.6 2760.4 -1177.3   2354.6 0.0075      0     <2e-16
>> ***
>> 
>> AAWVIII  52 2458.6 2760.3 -1177.3   2354.6 0.0094      0     <2e-16
>> ***
>> 
>> AAWV     54 2458.4 2771.7 -1175.2   2350.4 4.2412      2
>> 0.1200
>> 
>> AAWIII   56 2461.9 2786.9 -1175.0   2349.9 0.4275      2
>> 0.8076
>> 
>> AAWIV    56 2461.9 2786.9 -1175.0   2349.9 0.0000      0
>> 1.0000
>> 
>> AAWII    57 2463.9 2794.7 -1175.0   2349.9 0.0215      1
>> 0.8835
>> 
>> AAWI     60 2467.9 2816.1 -1174.0   2347.9 1.9763      3
>> 0.5773
>> 
>> modelAAW 66 2474.4 2857.3 -1171.2   2342.4 5.5778      6
>> 0.4721
>> 
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
>> 
>> 
>> 
>> 
>> So many people that I have consulted tell me that I shouldn’t trust
>> a model of df=zero and advise that I should re-run the models using
>> the drop1 command or simplify the maximal model but in my case I
>> don’t know if I will ever get results especially that it took more
>> than 16 hours straight with no luck and I simplified the maximal
>> model to the best I could.
>> 
>> 
>> 
>> When I checked R documentation, I read that “when given a sequence
>> of objects, anova tests the models against one another in the order
>> specified” based on the AIC value from the smallest to the largest.
>> However, there is  a warning that “the comparison between two or
>> more models will only be valid if they are fitted to the same
>> dataset. This may be a problem if there are missing values and R's
>> default of na.action = na.omit is used”, so I’m assuming this is
>> the case with my models.
>> 
>> 
>> 
>> Now, should I select model AAWVIII since it has the least p-value
>> (and the least BIC value compared to AAWVII)?
>> 
>> 
>> 
>> The formulas of the two models in addition to AAWXI model are as
>> follows. The first two models are similar to each other except that
>> in AAWVIII the variable stimulus.presentation is deleted and in
>> model AAWXI the variable Listgp is deleted.
>> 
>> 
>> AAWVI<-glmer(match~Listgp + vowel.quality + stimulus.presentation +
>> context + age + freq.+ Listgp:freq.+ Listgp:stimulus.presentation+
>> context:length+ context:freq.+
>> context:stimulus.presentation+length:stimulus.presentation+ 
>> length:age+ freq.:stimulus.presentation+ freq.:age+ 
>> stimulus.presentation:age+ + Listgp:stimulus.presentation + 
>> Listgp:vowel.quality + stimulus.presentation:vowel.quality + 
>> (Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener)
>> , data = SBAAW, family = "binomial", control=glmerControl(optCtrl= 
>> list(maxfun=2e5)), nAGQ =1)
>> 
>> 
>> 
>> AAWVIII<-glmer(match~Listgp + vowel.quality + context + age +
>> Listgp:freq.+ Listgp:stimulus.presentation+ context:length+
>> context:freq.+ 
>> context:stimulus.presentation+length:stimulus.presentation+
>> length:age+ freq.:stimulus.presentation+ freq.:age+
>> stimulus.presentation:age+ + Listgp:stimulus.presentation +
>> Listgp:vowel.quality + stimulus.presentation:vowel.quality +
>> (Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener)
>> , data = SBAAW, family = "binomial",
>> control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
>> 
>> 
>> 
>> AAWXI<-glmer(match~vowel.quality + context + age + Listgp:freq.+ 
>> Listgp:stimulus.presentation+ context:length+ context:freq.+ 
>> context:stimulus.presentation+length:stimulus.presentation+ 
>> freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+
>> + Listgp:stimulus.presentation + Listgp:vowel.quality + 
>> stimulus.presentation:vowel.quality + (Listgp|stimulus) + 
>> (stimulus.presentation+vowel.quality|listener) , data = SBAAW,
>> family = "binomial",
>> control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
>> 
>> 
>> 
>> I would appreciate your help with this.
>> 
>> 
>> 
>> -- Shadiya
>> 
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ 
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>



More information about the R-sig-mixed-models mailing list