[R-sig-ME] No data for 1 interaction combination: problem in R not in Genstat

Tue Apr 19 20:42:35 CEST 2011

On Mon, Apr 18, 2011 at 4:58 AM, Roger Humphry
<roger.w.humphry at googlemail.com> wrote:
> Continuing an earlier thread in case this is of interest to anyone.
> I earlier proposed 2 solutions to the problem of including an interaction of
> fixed effects where there was one or more levels of the interaction lacking
> any data. One was just to move to Genstat for these tasks (a bit
> unsatisfactory) and the other was to create a dummy factor variable
> indicating those levels within the interaction that are represented.
> It turns out that for both lme and lmer, this second solution does not work
> and gives the same error message as if one codes the interaction directly -
> I don't know why (DouglasB probably knows).
> Roger
>
> Code for a hypothetical example in which the interaction is of interest but
> for which all levels of the interaction are not represented.
>
> #To demonstrate lme's difficulty in dealing with fixed interactions where
> there is an interaction level with no data
> #Y=speed of needle threading ; each individual measured 3 times, some
> individuals given alcohol, some not. No children
> #could be given alcohol.
> #    N   man   woman child
> #no alc  8      7     10
> #alc     8      9     0
> baseman<-10; #base time taken by men without alcohol
> alcman<-12; #time taken by men with alcohol
> basewoman<-8; # base time for women
> alcwoman<-14; # time taken by women with alcohol
> basechild<-10; # base time for child
> alcchild<-20; # (theoretical) time taken by child with alcohol.
>
> create<-function(overallmu, n, sdval)
>  {
>  mulist<-rnorm(n, mean=overallmu, sd=sdval); vals1<-rnorm(n, mean=mulist,
> sd=0.5); vals2<-rnorm(n, mean=mulist, sd=0.5);
>  vals<-c(vals1, vals2);
>  return(vals);
>  }
> set.seed(45);
> vals<-create(baseman, 8, 1);humantype<-rep("M",16); alc<-rep("N",16);
> indiv<-rep(seq(1,8),2); bm<-data.frame(humantype,alc,indiv,vals);
> vals<-create(alcman,8,1); humantype<-rep("M",16); alc<-rep("Y",16);
> indiv<-rep(seq(9,16),2);am<-data.frame(humantype,alc, indiv,vals);
>
> vals<-create(basewoman, 7, 1);humantype<-rep("F",14); alc<-rep("N",14);
> indiv<-rep(seq(11,17),2); bw<-data.frame(humantype,alc,indiv,vals);
> vals<-create(basewoman,9,1); humantype<-rep("F",18); alc<-rep("Y",18);
> indiv<-rep(seq(18,26),2);aw<-data.frame(humantype,alc, indiv,vals);
>
> vals<-create(basechild, 10, 1);humantype<-rep("C",20); alc<-rep("N",20);
> indiv<-rep(seq(31,40),2); bc<-data.frame(humantype,alc,indiv,vals);
>
> fulldat<-rbind(bm,am, bw,aw,bc);
> names(fulldat)<-c("F_humantype","F_alc","F_ind", "vals")
> setwd("C:\\Roger\\R_docs")
> #write.csv(fulldat, "RHthreadspeeds.csv", row.names=FALSE)
> #fulldat<-read.csv("RHthreadspeeds.csv");
> fulldat$F_ind<-as.factor(fulldat$F_ind)
> library(nlme)
> model1.a<-lme(vals~ F_humantype+F_alc,random= ~ 1|F_ind, data=fulldat,
> method="REML")
> model1.b<-lme(vals~F_humantype*F_alc, random= ~1|F_ind, data=fulldat,
> method="REML")
> #So an alternative is to make a new factor with 5 levels for each of the 5
> levels that are
> #represented
> newfactor<-as.factor(paste(fulldat$F_humantype,fulldat$F_alc,sep=""))
> fulldat2<-data.frame(fulldat,newfactor)
> model1.c<-lme(vals~F_humantype+F_alc+newfactor, random= ~1|F_ind,
> data=fulldat, method="REML")
> model1.d<-lm(vals~F_humantype*F_alc,  data=fulldat)
> setwd("C:\\Roger\\R_docs")
> library(lme4);
> fm1 <- lmer(vals ~ F_humantype+F_alc+ (1|F_ind), fulldat)
> fm2 <- lmer(vals ~ F_humantype*F_alc+ (1|F_ind), fulldat)
> fm2 <- lmer(vals ~ F_humantype+F_alc+newfactor+ (1|F_ind), fulldat)

When you have five non-empty cells in the table you get 5 degrees of
freedom.  In your model fm2 you are trying to fit a total of 8
coefficients to information from the five non-empty cells and it
doesn't work so well.  You can fit the model

> fulldat <- within(fulldat, newfactor <- factor(F_humantype:F_alc))
> str(fulldat)
'data.frame':	84 obs. of  5 variables:
 $ F_humantype: Factor w/ 3 levels "M","F","C": 1 1 1 1 1 1 1 1 1 1 ...
 $ F_alc      : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
 $ F_ind      : int  1 2 3 4 5 6 7 8 1 2 ...
 $ vals       : num  11.25 9.18 9.06 9.36 9.72 ...
 $ newfactor  : Factor w/ 5 levels "M:N","M:Y","F:N",..: 1 1 1 1 1 1 1 1 1 1 ...
> print(fm2 <- lmer(vals ~ newfactor+ (1|F_ind), fulldat), corr=FALSE)
Linear mixed model fit by REML
Formula: vals ~ newfactor + (1 | F_ind)
   Data: fulldat
   AIC   BIC logLik deviance REMLdev
 256.3 273.3 -121.1    240.7   242.3
Random effects:
 Groups   Name        Variance Std.Dev.
 F_ind    (Intercept) 0.88040  0.93829
 Residual             0.57579  0.75881
Number of obs: 84, groups: F_ind, 36

Fixed effects:
             Estimate Std. Error t value
(Intercept)    9.7023     0.3821  25.390
newfactorM:Y   2.2876     0.5320   4.300
newfactorF:N  -1.4652     0.5405  -2.711
newfactorF:Y  -1.3636     0.5252  -2.596
newfactorC:N   0.4733     0.5127   0.923

>> Message: 6
>> Date: Tue, 12 Apr 2011 09:28:02 -0500
>> From: Douglas Bates<bates at stat.wisc.edu>
>> To: Ben Bolker<bbolker at gmail.com>
>> Cc: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] No data for 1 interaction combination: problem
>>        in R not in Genstat
>> Message-ID:<BANLkTimEBYgyZqbCyo9AFi37kx_KNZEyiw at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Tue, Apr 12, 2011 at 8:47 AM, Ben Bolker<bbolker at gmail.com>  wrote:
>>>
>>> Roger Humphry<roger.w.humphry at ...>  writes:
>>>
>>>> I've found that when I do a linear mixed model either with lme or lmer
>>>> with interactions amongst some fixed effects that if there are any
>>>> interaction levels unoccupied with data that the model fails.
>>>> "Error in MEEM(object, conLin, control$niterEM) :
>>>> ? ?Singularity in backsolve at level 0, block 1" for lme and
>>>>
>>>> "Error in mer_finalize(ans) : Downdated X'X is not positive definite,
>>>> 6." in lmer.
>>>>
>>>> Strangely in Genstat the model does produce output.
>>>> My solutions will be either to use Genstat or to create a single new
>>>> factor variable which has the levels of the interaction that *are*
>>>> represented in the interaction.
>>>>
>>>> I haven't found anything about this when searching. I guess that this
>>>> may be deliberate (e.g. Genstat uses a fudge considered inappropriate)
>>>> but please could anybody advise me?
>>>> I can provide a simple fictitious example that I invented in which the
>>>> interaction would be of interest but can't be modelled. (I haven't
>>>> posted it here because I'm unsure about the etiquette of posting large
>>>> e-mails containing data onto the discussion list).
>>>
>>> ?I don't know that the fudge is inappropriate, it may just not
>>> have been implemented. ?For example, lm() in base R is sensible
>>> about setting empty factor levels to NA (if Genstat gives you an
>>> *estimate* for the interaction term with the missing levels,
>>> that would be weird).
>>
>> The way that lm and glm handle the missing cells in a model with
>> interactions is through pivoting on columns of the model matrix, but
>> in a way that numerical analysts didn't anticipate and hence do not
>> provide for in standard numerical linear algebra software.  Most
>> linear algebra operations in R are done using Lapack code except for
>> the most important one - solving least squares problems - which uses
>> modified Linpack code.  (This, by the way, is why accelerated BLAS
>> don't improve calculation speed for lm, glm, etc.  The Linpack code
>> only uses level-1 BLAS and there is not much you can do with clever
>> programming to make level-1 BLAS run faster.)
>>
>> Most of the time rank deficiency in least squares problems is avoided
>> by using contrasts for terms involving factors.  However, missing
>> cells in a model with an interaction term are one case where rank
>> deficiency cannot (easily) be caught during the symbolic analysis.
>> The methods used in lm perform a QR decomposition with the "pivot on
>> apparent singularity" scheme that produces a set of columnst that are
>> full rank.  The methods used in lmer don't include a check for rank
>> deficiency.  Martin and I discussed the possibility of doing that
>> immediately after the model matrix for the fixed effects was generated
>> and performing the pivoting and rank reduction at that time but that
>> modification remains on the To Do list.
>>
>> The fact that lm inserts NA for the estimates of the coefficients
>> doesn't really relate to the computational method.  That is done, by
>> reversing the column permutation, after the other parameter estimates
>> have been calculated.
>>
>>> ?One way of posting compact examples is to make them from
>>> synthesized data, for example:
>>>
>>> d<- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],rep=1:5)
>>> d2<- subset(d, !(f1=="A"&  f2=="a"))
>>>
>>>
>>> with(d2,table(f1,f2))
>>>
>>> ? f2
>>> f1 ?a b c
>>> ?A 0 5 5
>>> ?B 5 5 5
>>> ?C 5 5 5
>>>
>>> d3<- data.frame(d2,y=runif(nrow(d2)))
>>>
>>> lm(y~f1*f2,data=d3)
>>>
>>> Call:
>>> lm(formula = y ~ f1 * f2, data = d3)
>>>
>>> Coefficients:
>>> (Intercept) ? ? ? ? ?f1B ? ? ? ? ?f1C ? ? ? ? ?f2b ? ? ? ? ?f2c ? ?
>>> ?f1B:f2b
>>> ? ?0.60964 ? ? -0.32352 ? ? -0.15239 ? ? ?0.13997 ? ? -0.11326 ? ?
>>> ?0.10615
>>> ? ?f1C:f2b ? ? ?f1B:f2c ? ? ?f1C:f2c
>>> ? -0.03831 ? ? ?0.47481 ? ? ? ? ? NA
>>>
>>> ?showing in this case that lm() does a sensible thing, setting
>>> one confounded parameter to NA.
>>>
>>> ?If you have long examples you can also try posting the data
>>> somewhere and supplying a URL.
>>>
>>> ?Your solutions sound sensible.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> R-sig-mixed-models mailing list
>> R-sig-mixed-models at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>> End of R-sig-mixed-models Digest, Vol 52, Issue 21
>> **************************************************
>>
>
>
> --
> Dr Roger Humphry, Applied Statistician,
> Epidemiology Research Unit, SAC, Inverness
> 01463 246059
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>