[R-sig-ME] No data for 1 interaction combination: problem in R not in Genstat

Tue Apr 12 16:28:02 CEST 2011

On Tue, Apr 12, 2011 at 8:47 AM, Ben Bolker <bbolker at gmail.com> wrote:
> Roger Humphry <roger.w.humphry at ...> writes:
>
>>
>> I've found that when I do a linear mixed model either with lme or lmer
>> with interactions amongst some fixed effects that if there are any
>> interaction levels unoccupied with data that the model fails.
>> "Error in MEEM(object, conLin, control$niterEM) :
>>    Singularity in backsolve at level 0, block 1" for lme and
>>
>> "Error in mer_finalize(ans) : Downdated X'X is not positive definite,
>> 6." in lmer.
>>
>> Strangely in Genstat the model does produce output.
>> My solutions will be either to use Genstat or to create a single new
>> factor variable which has the levels of the interaction that *are*
>> represented in the interaction.
>>
>> I haven't found anything about this when searching. I guess that this
>> may be deliberate (e.g. Genstat uses a fudge considered inappropriate)
>> but please could anybody advise me?
>> I can provide a simple fictitious example that I invented in which the
>> interaction would be of interest but can't be modelled. (I haven't
>> posted it here because I'm unsure about the etiquette of posting large
>> e-mails containing data onto the discussion list).
>
>  I don't know that the fudge is inappropriate, it may just not
> have been implemented.  For example, lm() in base R is sensible
> about setting empty factor levels to NA (if Genstat gives you an
> *estimate* for the interaction term with the missing levels,
> that would be weird).

The way that lm and glm handle the missing cells in a model with
interactions is through pivoting on columns of the model matrix, but
in a way that numerical analysts didn't anticipate and hence do not
provide for in standard numerical linear algebra software.  Most
linear algebra operations in R are done using Lapack code except for
the most important one - solving least squares problems - which uses
modified Linpack code.  (This, by the way, is why accelerated BLAS
don't improve calculation speed for lm, glm, etc.  The Linpack code
only uses level-1 BLAS and there is not much you can do with clever
programming to make level-1 BLAS run faster.)

Most of the time rank deficiency in least squares problems is avoided
by using contrasts for terms involving factors.  However, missing
cells in a model with an interaction term are one case where rank
deficiency cannot (easily) be caught during the symbolic analysis.
The methods used in lm perform a QR decomposition with the "pivot on
apparent singularity" scheme that produces a set of columnst that are
full rank.  The methods used in lmer don't include a check for rank
deficiency.  Martin and I discussed the possibility of doing that
immediately after the model matrix for the fixed effects was generated
and performing the pivoting and rank reduction at that time but that
modification remains on the To Do list.

The fact that lm inserts NA for the estimates of the coefficients
doesn't really relate to the computational method.  That is done, by
reversing the column permutation, after the other parameter estimates
have been calculated.

>  One way of posting compact examples is to make them from
> synthesized data, for example:
>
> d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],rep=1:5)
> d2 <- subset(d, !(f1=="A" & f2=="a"))
>
>
> with(d2,table(f1,f2))
>
>   f2
> f1  a b c
>  A 0 5 5
>  B 5 5 5
>  C 5 5 5
>
> d3 <- data.frame(d2,y=runif(nrow(d2)))
>
> lm(y~f1*f2,data=d3)
>
> Call:
> lm(formula = y ~ f1 * f2, data = d3)
>
> Coefficients:
> (Intercept)          f1B          f1C          f2b          f2c      f1B:f2b
>    0.60964     -0.32352     -0.15239      0.13997     -0.11326      0.10615
>    f1C:f2b      f1B:f2c      f1C:f2c
>   -0.03831      0.47481           NA
>
>  showing in this case that lm() does a sensible thing, setting
> one confounded parameter to NA.
>
>  If you have long examples you can also try posting the data
> somewhere and supplying a URL.
>
>  Your solutions sound sensible.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>