[R-sig-ME] No data for 1 interaction combination: problem in R not in Genstat

Mon Apr 18 11:58:55 CEST 2011

Continuing an earlier thread in case this is of interest to anyone.
I earlier proposed 2 solutions to the problem of including an 
interaction of fixed effects where there was one or more levels of the 
interaction lacking any data. One was just to move to Genstat for these 
tasks (a bit unsatisfactory) and the other was to create a dummy factor 
variable indicating those levels within the interaction that are 
represented.
It turns out that for both lme and lmer, this second solution does not 
work and gives the same error message as if one codes the interaction 
directly - I don't know why (DouglasB probably knows).
Roger

Code for a hypothetical example in which the interaction is of interest 
but for which all levels of the interaction are not represented.

#To demonstrate lme's difficulty in dealing with fixed interactions 
where there is an interaction level with no data
#Y=speed of needle threading ; each individual measured 3 times, some 
individuals given alcohol, some not. No children
#could be given alcohol.
#    N   man   woman child
#no alc  8      7     10
#alc     8      9     0
baseman<-10; #base time taken by men without alcohol
alcman<-12; #time taken by men with alcohol
basewoman<-8; # base time for women
alcwoman<-14; # time taken by women with alcohol
basechild<-10; # base time for child
alcchild<-20; # (theoretical) time taken by child with alcohol.

create<-function(overallmu, n, sdval)
  {
  mulist<-rnorm(n, mean=overallmu, sd=sdval); vals1<-rnorm(n, 
mean=mulist, sd=0.5); vals2<-rnorm(n, mean=mulist, sd=0.5);
  vals<-c(vals1, vals2);
  return(vals);
  }
set.seed(45);
vals<-create(baseman, 8, 1);humantype<-rep("M",16); alc<-rep("N",16); 
indiv<-rep(seq(1,8),2); bm<-data.frame(humantype,alc,indiv,vals);
vals<-create(alcman,8,1); humantype<-rep("M",16); alc<-rep("Y",16); 
indiv<-rep(seq(9,16),2);am<-data.frame(humantype,alc, indiv,vals);

vals<-create(basewoman, 7, 1);humantype<-rep("F",14); alc<-rep("N",14); 
indiv<-rep(seq(11,17),2); bw<-data.frame(humantype,alc,indiv,vals);
vals<-create(basewoman,9,1); humantype<-rep("F",18); alc<-rep("Y",18); 
indiv<-rep(seq(18,26),2);aw<-data.frame(humantype,alc, indiv,vals);

vals<-create(basechild, 10, 1);humantype<-rep("C",20); alc<-rep("N",20); 
indiv<-rep(seq(31,40),2); bc<-data.frame(humantype,alc,indiv,vals);

fulldat<-rbind(bm,am, bw,aw,bc);
names(fulldat)<-c("F_humantype","F_alc","F_ind", "vals")
setwd("C:\\Roger\\R_docs")
#write.csv(fulldat, "RHthreadspeeds.csv", row.names=FALSE)
#fulldat<-read.csv("RHthreadspeeds.csv");
fulldat$F_ind<-as.factor(fulldat$F_ind)
library(nlme)
model1.a<-lme(vals~ F_humantype+F_alc,random= ~ 1|F_ind, data=fulldat, 
method="REML")
model1.b<-lme(vals~F_humantype*F_alc, random= ~1|F_ind, data=fulldat, 
method="REML")
#So an alternative is to make a new factor with 5 levels for each of the 
5 levels that are
#represented
newfactor<-as.factor(paste(fulldat$F_humantype,fulldat$F_alc,sep=""))
fulldat2<-data.frame(fulldat,newfactor)
model1.c<-lme(vals~F_humantype+F_alc+newfactor, random= ~1|F_ind, 
data=fulldat, method="REML")
model1.d<-lm(vals~F_humantype*F_alc,  data=fulldat)
setwd("C:\\Roger\\R_docs")
library(lme4);
fm1 <- lmer(vals ~ F_humantype+F_alc+ (1|F_ind), fulldat)
fm2 <- lmer(vals ~ F_humantype*F_alc+ (1|F_ind), fulldat)
fm2 <- lmer(vals ~ F_humantype+F_alc+newfactor+ (1|F_ind), fulldat)

>
> ------------------------------
>
> Message: 6
> Date: Tue, 12 Apr 2011 09:28:02 -0500
> From: Douglas Bates<bates at stat.wisc.edu>
> To: Ben Bolker<bbolker at gmail.com>
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] No data for 1 interaction combination: problem
> 	in R not in Genstat
> Message-ID:<BANLkTimEBYgyZqbCyo9AFi37kx_KNZEyiw at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Apr 12, 2011 at 8:47 AM, Ben Bolker<bbolker at gmail.com>  wrote:
>> Roger Humphry<roger.w.humphry at ...>  writes:
>>
>>> I've found that when I do a linear mixed model either with lme or lmer
>>> with interactions amongst some fixed effects that if there are any
>>> interaction levels unoccupied with data that the model fails.
>>> "Error in MEEM(object, conLin, control$niterEM) :
>>> ? ?Singularity in backsolve at level 0, block 1" for lme and
>>>
>>> "Error in mer_finalize(ans) : Downdated X'X is not positive definite,
>>> 6." in lmer.
>>>
>>> Strangely in Genstat the model does produce output.
>>> My solutions will be either to use Genstat or to create a single new
>>> factor variable which has the levels of the interaction that *are*
>>> represented in the interaction.
>>>
>>> I haven't found anything about this when searching. I guess that this
>>> may be deliberate (e.g. Genstat uses a fudge considered inappropriate)
>>> but please could anybody advise me?
>>> I can provide a simple fictitious example that I invented in which the
>>> interaction would be of interest but can't be modelled. (I haven't
>>> posted it here because I'm unsure about the etiquette of posting large
>>> e-mails containing data onto the discussion list).
>> ?I don't know that the fudge is inappropriate, it may just not
>> have been implemented. ?For example, lm() in base R is sensible
>> about setting empty factor levels to NA (if Genstat gives you an
>> *estimate* for the interaction term with the missing levels,
>> that would be weird).
> The way that lm and glm handle the missing cells in a model with
> interactions is through pivoting on columns of the model matrix, but
> in a way that numerical analysts didn't anticipate and hence do not
> provide for in standard numerical linear algebra software.  Most
> linear algebra operations in R are done using Lapack code except for
> the most important one - solving least squares problems - which uses
> modified Linpack code.  (This, by the way, is why accelerated BLAS
> don't improve calculation speed for lm, glm, etc.  The Linpack code
> only uses level-1 BLAS and there is not much you can do with clever
> programming to make level-1 BLAS run faster.)
>
> Most of the time rank deficiency in least squares problems is avoided
> by using contrasts for terms involving factors.  However, missing
> cells in a model with an interaction term are one case where rank
> deficiency cannot (easily) be caught during the symbolic analysis.
> The methods used in lm perform a QR decomposition with the "pivot on
> apparent singularity" scheme that produces a set of columnst that are
> full rank.  The methods used in lmer don't include a check for rank
> deficiency.  Martin and I discussed the possibility of doing that
> immediately after the model matrix for the fixed effects was generated
> and performing the pivoting and rank reduction at that time but that
> modification remains on the To Do list.
>
> The fact that lm inserts NA for the estimates of the coefficients
> doesn't really relate to the computational method.  That is done, by
> reversing the column permutation, after the other parameter estimates
> have been calculated.
>
>> ?One way of posting compact examples is to make them from
>> synthesized data, for example:
>>
>> d<- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],rep=1:5)
>> d2<- subset(d, !(f1=="A"&  f2=="a"))
>>
>>
>> with(d2,table(f1,f2))
>>
>> ? f2
>> f1 ?a b c
>> ?A 0 5 5
>> ?B 5 5 5
>> ?C 5 5 5
>>
>> d3<- data.frame(d2,y=runif(nrow(d2)))
>>
>> lm(y~f1*f2,data=d3)
>>
>> Call:
>> lm(formula = y ~ f1 * f2, data = d3)
>>
>> Coefficients:
>> (Intercept) ? ? ? ? ?f1B ? ? ? ? ?f1C ? ? ? ? ?f2b ? ? ? ? ?f2c ? ? ?f1B:f2b
>> ? ?0.60964 ? ? -0.32352 ? ? -0.15239 ? ? ?0.13997 ? ? -0.11326 ? ? ?0.10615
>> ? ?f1C:f2b ? ? ?f1B:f2c ? ? ?f1C:f2c
>> ? -0.03831 ? ? ?0.47481 ? ? ? ? ? NA
>>
>> ?showing in this case that lm() does a sensible thing, setting
>> one confounded parameter to NA.
>>
>> ?If you have long examples you can also try posting the data
>> somewhere and supplying a URL.
>>
>> ?Your solutions sound sensible.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
> ------------------------------
>
> _______________________________________________
> R-sig-mixed-models mailing list
> R-sig-mixed-models at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> End of R-sig-mixed-models Digest, Vol 52, Issue 21
> **************************************************
>

-- 
Dr Roger Humphry, Applied Statistician,
Epidemiology Research Unit, SAC, Inverness
01463 246059