[R-sig-ME] Anova II table, df, drop1 and very complex regression models!
Shadiya Al Hashmi
saah500 at york.ac.uk
Fri Sep 2 18:58:07 CEST 2016
Thanks Steven for pointing that out. If I read Thierry correctly, he
suggested that after selecting the model using the subset with no NA
values, then to refit the final model on the original dataset with NAs
(Thierry, kindly correct if I got you wrong).
So, I actually ran two parallel R sessions, one with the NAs and one
without and the results were the same. However, this is probably the case
here since I have only one cell with an NA value, so this wouldn't affect
the results anyway.
I also found out that the dropterm command in the MASS package worked well
with my data, so I used it in selecting my stepwise models.
Thanks again Steven and thanks to you too Thierry!
Best wishes,
Shadiya
On 30 August 2016 at 15:42, Steven J. Pierce <pierces1 at msu.edu> wrote:
> I completely agree that analyzing a stable dataset is essential, but I'd
> argue that selecting a subset containing only cases with no missing data is
> actually changing the nature of the population to which your inferences can
> be generalized. In effect it tacks "and who have no missing data" onto the
> definition of the target population. That is unlikely to be the population
> you really want to study. It may also dramatically reduce your sample size,
> thereby reducing power and precision of the estimates.
>
> To preserve inference to the original intended population, solve the
> missing data problem with either some form of imputation or by using a
> full-information maximum likelihood estimation method that doesn’t throw
> out cases with missing data.
>
>
> Steven J. Pierce, Ph.D.
> Associate Director
> Center for Statistical Training & Consulting (CSTAT)
> Michigan State University
>
> -----Original Message-----
> From: Thierry Onkelinx [mailto:thierry.onkelinx at inbo.be]
> Sent: Tuesday, August 30, 2016 3:30 AM
> To: Shadiya Al Hashmi <saah500 at york.ac.uk>
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Anova II table, df, drop1 and very complex
> regression models!
>
> Dear Shadiya,
>
> You need to do the model selection on a stable dataset. Therefore you
> should create a subset which doesn't contain missing values in the
> covariates. Use this subset for model selection. Then you can refit the
> final model on the total dataset.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-08-30 7:23 GMT+02:00 Shadiya Al Hashmi <saah500 at york.ac.uk>:
>
> > Good morning,
> >
> >
> > I have complex data of 7 variables (6 treatment + 1 control [“age” in the
> > model below]) plus 18 interactions in a dataset of 2448 observations
> which
> > have some missing values (NAs). The maximal model which I have
> simplified
> > as per y hypothesis is as follows.
> >
> >
> >
> > modelAAW<-glmer(match~Listgp + vowel.quality + stimulus.presentation +
> > context +length + age + freq.+ Listgp:context+ Listgp:length+
> Listgp:freq.+
> > Listgp:stimulus.presentation+ Listgp:age+ context:length+ context:freq.+
> > context:stimulus.presentation+ context:age+ length:freq.+
> > length:stimulus.presentation+ length:age+ freq.:stimulus.presentation+
> > freq.:age+ stimulus.presentation:age+ Listgp:stimulus.presentation +
> > Listgp:vowel.quality + stimulus.presentation:vowel.quality +
> > (Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener) ,
> data
> > =
> > SBAAW, family = "binomial", control=glmerControl(optCtrl=
> > list(maxfun=2e5)),
> > nAGQ =1)
> >
> >
> >
> > I ran a binomial logistic regression analysis on the data and did the
> > stepwise regression manually since the drop1(modelAAW, test = "Chisq")
> > command yielded no results in a span of more than 16 hours. The resulting
> > regression models are nested in the maximal model (modelAAW).
> >
> >
> >
> > Then, I reached the model selection step where I have to interpret the
> > anova table below which has degrees of freedom of zero for some models.
> >
> >
> >
> > Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
> >
> > AAWXI 49 2454.5 2738.8 -1178.2 2356.5
> >
> > AAWXII 49 2456.4 2740.8 -1179.2 2358.4 0.0000 0 1.0000
> >
> > AAWXIII 49 2456.4 2740.8 -1179.2 2358.4 0.0000 0 1.0000
> >
> > AAWIX 51 2457.6 2753.5 -1177.8 2355.6 2.8749 2 0.2375
> >
> > AAWX 51 2457.6 2753.5 -1177.8 2355.6 0.0000 0 1.0000
> >
> > AAWVI 52 2458.6 2760.4 -1177.3 2354.6 0.9410 1 0.3320
> >
> > AAWVII 52 2458.6 2760.4 -1177.3 2354.6 0.0075 0 <2e-16 ***
> >
> > AAWVIII 52 2458.6 2760.3 -1177.3 2354.6 0.0094 0 <2e-16 ***
> >
> > AAWV 54 2458.4 2771.7 -1175.2 2350.4 4.2412 2 0.1200
> >
> > AAWIII 56 2461.9 2786.9 -1175.0 2349.9 0.4275 2 0.8076
> >
> > AAWIV 56 2461.9 2786.9 -1175.0 2349.9 0.0000 0 1.0000
> >
> > AAWII 57 2463.9 2794.7 -1175.0 2349.9 0.0215 1 0.8835
> >
> > AAWI 60 2467.9 2816.1 -1174.0 2347.9 1.9763 3 0.5773
> >
> > modelAAW 66 2474.4 2857.3 -1171.2 2342.4 5.5778 6 0.4721
> >
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> >
> >
> >
> >
> > So many people that I have consulted tell me that I shouldn’t trust a
> model
> > of df=zero and advise that I should re-run the models using the drop1
> > command or simplify the maximal model but in my case I don’t know if I
> will
> > ever get results especially that it took more than 16 hours straight with
> > no luck and I simplified the maximal model to the best I could.
> >
> >
> >
> > When I checked R documentation, I read that “when given a sequence of
> > objects, anova tests the models against one another in the order
> specified”
> > based on the AIC value from the smallest to the largest. However, there
> > is a warning that “the comparison between two or more models will only
> be
> > valid if they are fitted to the same dataset. This may be a problem if
> > there are missing values and R's default of na.action = na.omit is used”,
> > so I’m assuming this is the case with my models.
> >
> >
> >
> > Now, should I select model AAWVIII since it has the least p-value (and
> the
> > least BIC value compared to AAWVII)?
> >
> >
> >
> > The formulas of the two models in addition to AAWXI model are as follows.
> > The first two models are similar to each other except that in AAWVIII the
> > variable stimulus.presentation is deleted and in model AAWXI the variable
> > Listgp is deleted.
> >
> >
> > AAWVI<-glmer(match~Listgp + vowel.quality + stimulus.presentation +
> context
> > + age + freq.+ Listgp:freq.+ Listgp:stimulus.presentation+
> context:length+
> > context:freq.+ context:stimulus.presentation+
> length:stimulus.presentation+
> > length:age+ freq.:stimulus.presentation+ freq.:age+
> > stimulus.presentation:age+ + Listgp:stimulus.presentation +
> > Listgp:vowel.quality + stimulus.presentation:vowel.quality +
> > (Listgp|stimulus) + (stimulus.presentation+vowel.quality|listener) ,
> data
> > =
> > SBAAW, family = "binomial", control=glmerControl(optCtrl=
> > list(maxfun=2e5)),
> > nAGQ =1)
> >
> >
> >
> > AAWVIII<-glmer(match~Listgp + vowel.quality + context + age +
> Listgp:freq.+
> > Listgp:stimulus.presentation+ context:length+ context:freq.+
> > context:stimulus.presentation+length:stimulus.presentation+ length:age+
> > freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+ +
> > Listgp:stimulus.presentation + Listgp:vowel.quality +
> > stimulus.presentation:vowel.quality + (Listgp|stimulus) +
> > (stimulus.presentation+vowel.quality|listener) , data = SBAAW, family =
> > "binomial", control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
> >
> >
> >
> > AAWXI<-glmer(match~vowel.quality + context + age + Listgp:freq.+
> > Listgp:stimulus.presentation+ context:length+ context:freq.+
> > context:stimulus.presentation+length:stimulus.presentation+
> > freq.:stimulus.presentation+ freq.:age+ stimulus.presentation:age+ +
> > Listgp:stimulus.presentation + Listgp:vowel.quality +
> > stimulus.presentation:vowel.quality + (Listgp|stimulus) +
> > (stimulus.presentation+vowel.quality|listener) , data = SBAAW, family =
> > "binomial", control=glmerControl(optCtrl=list(maxfun=2e5)), nAGQ =1)
> >
> >
> >
> > I would appreciate your help with this.
> >
> >
> >
> > --
> > Shadiya
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list