[R-sig-ME] Error in lme4 rank of X = 28 < ncol(X) = 29

Mon Jan 13 00:59:34 CET 2014

On 14-01-12 06:47 PM, Michael Williamson wrote:
> Good Morning,
> 
> I spoke you recently concerning a problem with a rank deficient error
> when running a glmer model (see below)
> 
> My boss wants me to try and run a model working out rates using a
> glmmadbd model and I'm getting the same rank deficient problem -
> SABmod<-glmmadmb(SABRate~Socialbeh+Compcat+DistancePodCat+

DistanceSingerCat+Pods15Cat+SingersCat+Windspeed.km.h.+BoatPhs,random=~1|FocalID,family="nbinom1",
> zeroInflation=TRUE, data=Behav)
> 
> Error in glmmadmb(SABRate ~ Socialbeh + Compcat + DistancePodCat +
> DistanceSingerCat +  : rank of X = 28 < ncol(X) = 29
> 
> An updated version of lme4 seemed to fix the previous model and it
> now takes into account rank deficiencies and I was wondering if there
> was a version on glmmadmb package that may do this also. I've tried
> to reinstall the newest version but doesn't appear to work.

  No, there isn't -- glmmADMB hasn't implemented this feature (yet).
You're welcome to look at the chkRank.drop.cols() function in
https://github.com/lme4/lme4/blob/master/R/modular.R to see how lme4
does it ... one way or the other, though, you're going to have to drop a
column.  You can use qr() or svd() to figure out which columns are
multicollinear ...

  I'm taking the liberty of cc'ing this back to r-sig-mixed-models.

 cheers
   Ben Bolker

> 
> Thanks for your time
> 
> Mike Williamson
> 
> -----Original Message----- From: lme4 maintainer
> [mailto:bbolker at gmail.com] Sent: Sunday, 5 January 2014 3:50 PM To:
> Michael Williamson;
> "lme4-authors at lists.r-forge.r-project.org"@newmailhub.uq.edu.au 
> Subject: Re: Error in lme4 rank of X = 28 < ncol(X) = 29
> 
> On 14-01-02 05:32 PM, Michael Williamson wrote:
>> Good Morning,
>> 
>> My name is Mike Williamson and I am a research assistant for the 
>> University of Queensland. I'm in the process of running some
>> analyses with R on a dataset I have been working on in
>> collaboration with a PhD student. Just before Christmas she sent me
>> some code for me to use on my dataset. I started upon it yesterday
>> but was consistently coming up with the error below
>> 
>>> SABMod<- 
>>> glmer(SABinom~Socialbeh+Compcat+DistancePodCat+DistanceSingerCat+Pods
>>>
>>> 
15+
> SingersCat+Windspeed.km.h.+BoatPhs+(1|FocalID),data=BehavTr,family="bino
>
> 
SingersCat+mial")
>> 
>>> 
>> Error in lme4::glFormula(formula = SABinom ~ Socialbeh + Compcat +
>>  DistancePodCat +  :
>> 
>> rank of X = 29 < ncol(X) = 31
>> 
>> Sadly she is completely out of contact on field work for the next 
>> 2-3 weeks and I'm pretty stuck on this so apologies for the email
>> but I was wondering if might be able to help.
>> 
>> I'm working on the whether the relationship of boat phase (before 
>> approach attempt after) affects the rates of binomial data of
>> surface active behaviours of whales (SABinom).  The predictor
>> variables are something she has been working on and told me to
>> include in my model.
>> 
>> From the bit of research I've done it looks like the error says my
>>  data is rank deficient. Now I think this is most likely because of
>> my low samples for my approach and attempt but I can't find
>> anywhere online or in papers (such as Zuul 2009) to confirm this or
>> find an option for low sample sizes. I've attached the sample sizes
>> for presence absence of surface active behaviours for boat phase
>> below. As you can see approach and attempt only have 5 and 15
>> between. This is due to the approach and attempt phases being
>> shorter periods than before and after. Is low sample size the
>> problem here? And if so do you have any suggestions for how I could
>> go about this, or any books to read. I've been trawling the
>> libraries online and don't seem to be able find anything.
> 
> Sorry to take so long to get back to you.  Your data are indeed
> rank-deficient; the error is telling you that you are trying to
> estimate 31 fixed-effect parameters (columns of the X matrix), but
> that there are only 29 linearly independent combinations of predictor
> variables in your data set.  Small sample size is not an inherent
> problem, but it is generally more likely (with an unbalanced data
> set) that you will end up with a rank-deficient problem.
> 
> I was hoping I had written more somewhere else in the past about how
> to use model.matrix() and svd() to diagnose multicollinearity
> problems --
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019499.html
> isn't very complete (I should add it to the FAQ).
> 
> The fact that this table is badly unbalanced doesn't automatically
> mean you have a problem, but in combination with some of your other
> variables it is more likely to make the problem unidentifiable.
> 
>> 0    1
>> 
>> After    7445 1015 Approach  351    5 Attempt   991   15 Before
>> 2079  344
> 
> If you can install a recent development version of lme4 (e.g.
> 
> install.packages("lme4",repos="http://lme4.r-forge.r-project.org/repos"),
>
> 
that may help -- the current development version automatically tries to
adjust the fixed-effect model matrix to get rid of rank deficiency.
> 
> Ben Bolker
>