[R-sig-ME] lme capable of running with missing data?

Douglas Bates bates at stat.wisc.edu
Tue Feb 7 00:39:01 CET 2012


On Fri, Feb 3, 2012 at 8:20 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
> On 04/02/12 14:45, Kenneth Frost wrote:
>>
>> On 02/03/12, Charles Determan Jr   wrote:
>>>
>>> Kevin,
>>>
>>> I understand that but then how is SAS accomplishing the interactions?
>>
>>
>> I have been following this conversation a little bit and this seems to be
>> the right question to ask. I would also like to know the answer. However,
>> this could be the wrong venue to get an answer to this question.
>
> <SNIP>
>
> It may be the case that fortune(203) is relevant here! :-)

Mathematical impossibilty, no (fortune(203) refers to obtaining
negative estimates of variance components, IIRC).  The problem here is
determining a full-rank model matrix for a model with interactions and
missing cells.  Because SAS uses the sweep operator in solving least
squares problems it does not encounter problems with rank deficiency.
(I am sorely tempted to make remarks about "sweeping them under the
carpet".)  In fact, SAS expects to handle rank deficiencies because it
generates a redundant set of indicators for each factor variable then
prunes them on the fly.

The approach in R is to generate a model matrix that should be of
full-rank except in circumstances like this and to check for rank
deficiency.  There is special code in the version of the QR
decomposition used with R to detect rank deficiency and pivot the
offending columns out but keep the others in their original order.

Dirk Eddelbuettel and I explored several approaches to handling such
rank deficiency in the vignette accompanying the RcppEigen package
(http://cran.us.r-project.org/web/packages/RcppEigen/vignettes/RcppEigen-intro-nojss.pdf).
 The development version of lme4 (called lme4Eigen on the R-forge
project site) detects rank deficiency earlier in the calculation but
does not yet repair the rank deficiency.  Using the column-pivoted QR
decomposition is probably the best approach but even then it would be
necessary to find the columns that are linear dependent on columns to
their left then drop only those columns.  It is not impossible by any
means, it just requires some work and is not high on the priority list
right now.

Regarding type III tests, I have forgotten which ones they are.  Are
they the sequential sums of squares or the ones where you drop the
main effect but keep the interactions thereby rendering your null
model nonsensical is most cases?
All the silliness about Types I, II, III and IV sums of squares and
tests was formulated when fitting any model was difficult (see
fortune("JCL")).  So doing a hypothesis test by fitting the null model
and fitting the alternative model and comparing the results would take
much much longer than doing a lot of linear algebra gymnastics on the
fit of the full or alternative model.  That is no longer the case.  If
you really want to perform a hypothesis test then formulate it in
terms of models, fit them and compare them.  It's not difficult and
has the undeniable advantage of forcing you to think about the model
and whether it makes sense.  Read Bill Venables' famous unpublished
paper "Exegeses on Linear Models" (just put the name in a search
engine).  (By the way, Bill is going to be at the useR conference in
Nashville in July so maybe if a bunch of us ganged up on him he could
be convinced to submit a version of that paper for publication.)




More information about the R-sig-mixed-models mailing list