[R-sig-ME] Error in lme4 rank of X = 28 < ncol(X) = 29
Ben Bolker
bbolker at gmail.com
Mon Jan 13 00:59:34 CET 2014
On 14-01-12 06:47 PM, Michael Williamson wrote:
> Good Morning,
>
> I spoke you recently concerning a problem with a rank deficient error
> when running a glmer model (see below)
>
> My boss wants me to try and run a model working out rates using a
> glmmadbd model and I'm getting the same rank deficient problem -
> SABmod<-glmmadmb(SABRate~Socialbeh+Compcat+DistancePodCat+
DistanceSingerCat+Pods15Cat+SingersCat+Windspeed.km.h.+BoatPhs,random=~1|FocalID,family="nbinom1",
> zeroInflation=TRUE, data=Behav)
>
> Error in glmmadmb(SABRate ~ Socialbeh + Compcat + DistancePodCat +
> DistanceSingerCat + : rank of X = 28 < ncol(X) = 29
>
> An updated version of lme4 seemed to fix the previous model and it
> now takes into account rank deficiencies and I was wondering if there
> was a version on glmmadmb package that may do this also. I've tried
> to reinstall the newest version but doesn't appear to work.
No, there isn't -- glmmADMB hasn't implemented this feature (yet).
You're welcome to look at the chkRank.drop.cols() function in
https://github.com/lme4/lme4/blob/master/R/modular.R to see how lme4
does it ... one way or the other, though, you're going to have to drop a
column. You can use qr() or svd() to figure out which columns are
multicollinear ...
I'm taking the liberty of cc'ing this back to r-sig-mixed-models.
cheers
Ben Bolker
>
> Thanks for your time
>
> Mike Williamson
>
> -----Original Message----- From: lme4 maintainer
> [mailto:bbolker at gmail.com] Sent: Sunday, 5 January 2014 3:50 PM To:
> Michael Williamson;
> "lme4-authors at lists.r-forge.r-project.org"@newmailhub.uq.edu.au
> Subject: Re: Error in lme4 rank of X = 28 < ncol(X) = 29
>
> On 14-01-02 05:32 PM, Michael Williamson wrote:
>> Good Morning,
>>
>> My name is Mike Williamson and I am a research assistant for the
>> University of Queensland. I'm in the process of running some
>> analyses with R on a dataset I have been working on in
>> collaboration with a PhD student. Just before Christmas she sent me
>> some code for me to use on my dataset. I started upon it yesterday
>> but was consistently coming up with the error below
>>
>>> SABMod<-
>>> glmer(SABinom~Socialbeh+Compcat+DistancePodCat+DistanceSingerCat+Pods
>>>
>>>
15+
> SingersCat+Windspeed.km.h.+BoatPhs+(1|FocalID),data=BehavTr,family="bino
>
>
SingersCat+mial")
>>
>>>
>> Error in lme4::glFormula(formula = SABinom ~ Socialbeh + Compcat +
>> DistancePodCat + :
>>
>> rank of X = 29 < ncol(X) = 31
>>
>> Sadly she is completely out of contact on field work for the next
>> 2-3 weeks and I'm pretty stuck on this so apologies for the email
>> but I was wondering if might be able to help.
>>
>> I'm working on the whether the relationship of boat phase (before
>> approach attempt after) affects the rates of binomial data of
>> surface active behaviours of whales (SABinom). The predictor
>> variables are something she has been working on and told me to
>> include in my model.
>>
>> From the bit of research I've done it looks like the error says my
>> data is rank deficient. Now I think this is most likely because of
>> my low samples for my approach and attempt but I can't find
>> anywhere online or in papers (such as Zuul 2009) to confirm this or
>> find an option for low sample sizes. I've attached the sample sizes
>> for presence absence of surface active behaviours for boat phase
>> below. As you can see approach and attempt only have 5 and 15
>> between. This is due to the approach and attempt phases being
>> shorter periods than before and after. Is low sample size the
>> problem here? And if so do you have any suggestions for how I could
>> go about this, or any books to read. I've been trawling the
>> libraries online and don't seem to be able find anything.
>
> Sorry to take so long to get back to you. Your data are indeed
> rank-deficient; the error is telling you that you are trying to
> estimate 31 fixed-effect parameters (columns of the X matrix), but
> that there are only 29 linearly independent combinations of predictor
> variables in your data set. Small sample size is not an inherent
> problem, but it is generally more likely (with an unbalanced data
> set) that you will end up with a rank-deficient problem.
>
> I was hoping I had written more somewhere else in the past about how
> to use model.matrix() and svd() to diagnose multicollinearity
> problems --
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019499.html
> isn't very complete (I should add it to the FAQ).
>
> The fact that this table is badly unbalanced doesn't automatically
> mean you have a problem, but in combination with some of your other
> variables it is more likely to make the problem unidentifiable.
>
>> 0 1
>>
>> After 7445 1015 Approach 351 5 Attempt 991 15 Before
>> 2079 344
>
> If you can install a recent development version of lme4 (e.g.
>
> install.packages("lme4",repos="http://lme4.r-forge.r-project.org/repos"),
>
>
that may help -- the current development version automatically tries to
adjust the fixed-effect model matrix to get rid of rank deficiency.
>
> Ben Bolker
>
More information about the R-sig-mixed-models
mailing list