[R-sig-ME] Anova II table, df, drop1 and very complex regression models!

Steven J. Pierce pierces1 at msu.edu
Thu Sep 1 14:41:31 CEST 2016

True, we don't know how much missing data there is in this particular case,
and of course analyzing multiple imputed datasets will take more
computational time than analyzing a single dataset (either complete or
incomplete). My point was only that using listwise deletion to reduce an
incomplete dataset to a complete one by discarding observations with missing
data on the variables involved in the analyses could be changing the nature
of the population to which one can legitimately generalize the results. 

If there is a trivial amount of missing data, that may not matter much
because it won't change the answer much (negligible bias). But with larger
amounts of missing data that difference may well be very important because
now you're likely working from a more biased sample (or we can describe it
as a sample that represents a different, more narrowly defined population).
Drawing conclusions about the original target population from a biased
sample is poor statistical and scientific practice. The number of people who
regularly do so despite the fact that solutions to such a problem exist
concerns me. 

I just wanted the original poster to consider the issues involved and assess
the amount of missing data before blindly using listwise deletion. 

Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University

-----Original Message-----
From: David Duffy [mailto:David.Duffy at qimrberghofer.edu.au] 
Sent: Wednesday, August 31, 2016 11:45 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Anova II table, df, drop1 and very complex
regression models!

 Steven J. Pierce [pierces1 at msu.edu] wrote:

> I'd probably use Mplus (www.statmodel.com) for that, [...]
> Fortunately, the lavaan package in R replicates some of what Mplus can do.

Every couple of years, I mention OpenMx on this list:


I don't know exactly how much of Mplus's functionality it provides, but
suspect it would be close to 100%.

As to the OP's question, we don't know how much data is actually missing, or
what pattern that takes. We also don't know why some of the fitted models
differ by 0 d.f.  If we did impute the data 5 times, wouldn't that give us
an 80 h run-time?

Cheers, David.

More information about the R-sig-mixed-models mailing list