[R-sig-ME] [R] errors with lme4
Ben Bolker
bbolker at gmail.com
Sat Nov 26 22:35:43 CET 2011
* It looks like you have 156 observations **after** dropping the 352
observations
with missing predictors ... The 3 covariates are indeed (I very
strongly suspect) due
to collinearity.
I'm trying to work out a clever piece of linear algebra that will
tell you exactly which
columns of the design matrix (i.e. predictors) are multicollinear, but
haven't figured it
out yet. More sensibly, you should just stare at your data/use common
sense to figure
out which ones are completely confounded (because there definitely are
some). If two (or more)
effects are completely confounded (another way of saying "perfectly
multicollinear"), then it's
simply impossible to get well-defined estimates of all of their
effects in the same model.
You have to decide which one(s) to take out. (If you like you can run
separate models
with each term, but you have to recognize that you will have no way to
choose among
them on the basis of the data -- they should all fit equally well.)
On Fri, Nov 25, 2011 at 4:19 AM, Alessio Unisi <franceschi6 at unisi.it> wrote:
> ...I always try not to work in "a fairly dangerous way": ))
>
> Thanks again for help and advices.
>
> I run the the GLM without random effect as you suggest, attached the
> results.
> Briefly ...3 covariates completely NA...352 observations deleted due to
> missingness...this is due to collinearity as you suggested?
> In which way could i solve this problem?
>
> i attached a part of the dataset...sorry but can't send the original set of
> data...sorry..
>
> kind regards
> alessio franceschi
>
> Ben Bolker ha scritto:
>>
>> [cc'ing to r-sig-mixed-models list]
>>
>> On 11-11-24 03:25 PM, Alessio Unisi wrote:
>>
>>>
>>> Hi Ben,
>>> thanks for answer!
>>>
>>> sorry but i'm a new R-user and i'm not so skilled...also in statistic! :
>>> ))
>>> just few answer and question to yours...
>>>
>>
>> Knowing *neither* R *nor* statistics can be a fairly dangerous
>> combination. If you ask politely, people on the R help lists will often
>> help with statistical questions, but they are technically off-topic.
>> There are other places (such as http://stats.stackexchange.com/ ) for
>> asking statistics questions ... and it would really be very best to see
>> if you can get some local help (classes or helpful colleagues/fellow
>> students/professors/consultants).
>>
>>
>>>>
>>>> I can't prove it, but I strongly suspect that some of your
>>>> coefficients are perfectly multicollinear.
>>>
>>> some they are...yes..but not all...hatch and lay surely they are...what
>>> does it change i have to multiplicate instead of sum?
>>>
>>
>> By "perfectly multicollinear" I don't mean that they are strongly
>> collinear (which ecologists often worry about, correctly, but sometimes
>> more than they need to) but rather "perfectly".
>>
>> For example, suppose you ran a 2x2 factorial design (e.g. effects of
>> temperature and light) but ended up with a missing "corner" (e.g. no
>> samples in the high-light/high-temperature combination). You would then
>> be unable to estimate an interaction term, because you would be trying
>> to estimate 4 parameters (intercept/grand mean, light effect,
>> temperature effect, interaction) from only three independent sets of
>> data. This is the same idea: some of your predictors probably line up
>> *perfectly* with combinations of other predictors.
>>
>>>
>>> the idea of using glmer() or lmer() is because i had to deal with random
>>> factor (1|territory)...glm can't handle this? i don't think...
>>>
>>
>> You're correct that you will need to get back to glmer() eventually,
>> but I wanted you try out glm() because the presence of NAs in your
>> coefficient vector will confirm that collinearity is the problem, not
>> some other issue with glmer ...
>>
>>
>>>>
>>>> How many observations are left after na.omit(fledge)?
>>>>
>>>>
>>>
>>> sorry..i don't understand...
>>>
>>
>> When you run the analysis, R will drop rows from your data set that
>> have NAs in any of the predictors. It looks like you have a total of
>> 152 observations, but I wonder how many there are with complete records.
>> nrow(na.omit(fledge)) will tell you this.
>>
>>
>>>>
>>>> What is the difference between your 'S1' and 'S2' temperature
>>>> records?
>>>>
>>>
>>> those are temperature recorded in different time....S1 is during egg
>>> laying and incubation and S2 is during hatching and rearing of the chicks
>>>
>>
>> Can we please see the results of summary(fledge)?
>>
>> It would be good if you were willing to post your whole data set
>> somewhere for download (or at a pinch e-mail it to me).
>>
>> Ben Bolker
>>
>>
>>>
>>> thank you
>>> alessio
>>>
>>>
>>> Ben Bolker ha scritto:
>>>
>>>>
>>>> Alessio Unisi <franceschi6 <at> unisi.it> writes:
>>>>
>>>>
>>>>>
>>>>> Dear R-users,
>>>>> i need help for this topic!
>>>>>
>>>>> I'm trying to determine if the reproductive success (0=fail,
>>>>> 1=success) of a species of bird is related to a list of covariates.
>>>>>
>>>>> These are the covariates:
>>>>> § elev: elevation of nest (meters)
>>>>> § seadist: distance from the sea (meters)
>>>>> § meanterranova: records of temperature
>>>>> § minpengS1: records of temperature
>>>>> § wchillpengS1: records of temperature
>>>>> § minpengS2: records of temperature
>>>>> § wchillpengS2: records of temperature
>>>>> § nnd: nearest neighbour distance
>>>>> § npd: nearest penguin distance
>>>>> § eggs: numbers of eggs
>>>>> § lay: laying date (julian calendar)
>>>>> § hatch: hatching date (julian calendar)
>>>>> I have some NAs in the data.
>>>>>
>>>>> I want to test the model with all the variable then i want to remove
>>>>> some, but the ideal model:
>>>>> GLM.1 <-lmer(fledgesucc ~ +lay +hatch +elev +seadist +nnd +npd
>>>>> +meanterranova +minpengS1 +minpengS2 +wchillpengS1 +wchillpengS2
>>>>> +(1|territory), family=binomial(logit), data=fledge)
>>>>>
>>>>> doesn't work because of these errors:
>>>>> 'Warning message: In mer_finalize(ans) : gr cannot be computed at
>>>>> initial par (65)'.
>>>>> "matrix is not symmetric [1,2]"
>>>>>
>>>>> If i delete one or more of the T records (i.e. minpengS2
>>>>> +wchillpengS2) the model works...below and example:
>>>>>
>>>>> GLM.16 <-lmer(fledgesucc ~ lay +hatch +elev +seadist +nnd +npd
>>>>> +meanterranova +minpengS1 +(1|territory), family=binomial(logit),
>>>>> data=fledge)
>>>>>
>>>>> > summary(GLM.16)
>>>>> Generalized linear mixed model fit by the Laplace approximation
>>>>> Formula: fledgesucc ~ lay + hatch + elev + seadist + nnd + npd +
>>>>> meanterranova + minpengS1 + (1 | territory)
>>>>> Data: fledge
>>>>> AIC BIC logLik deviance
>>>>> 174 204.2 -77 154
>>>>> Random effects:
>>>>> Groups Name Variance Std.Dev.
>>>>> territory (Intercept) 0.54308 0.73694
>>>>> Number of obs: 152, groups: territory, 96
>>>>>
>>>>>
>>>>
>>>> I can't prove it, but I strongly suspect that some of your
>>>> coefficients are perfectly multicollinear. Try running your
>>>> model as a regular GLM:
>>>>
>>>> g1 <- glm(fledgesucc ~ +lay +hatch +elev +seadist +nnd +npd
>>>> +meanterranova +minpengS1 +minpengS2 +wchillpengS1 +wchillpengS2
>>>> and see if some of the coefficients are NA.
>>>>
>>>> coef(g1)
>>>>
>>>> lm() and glm() can handle this sort of "rank-deficient" or
>>>> multicollinear input, (g)lmer can't, as of now.
>>>>
>>>> I suspect that you may be overfitting your model anyway:
>>>> you should aim for not more than 10 observations per parameter
>>>> (in your case, since all your predictors appear to be continuous,
>>>> How many observations are left after na.omit(fledge)?
>>>>
>>>> What is the difference between your 'S1' and 'S2' temperature
>>>> records?
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> -----
>>>> Nessun virus nel messaggio.
>>>> Controllato da AVG - www.avg.com
>>>> Versione: 2012.0.1873 / Database dei virus: 2101/4636 - Data di
>>>> rilascio: 24/11/2011
>>>>
>>
>>
>>
>> -----
>> Nessun virus nel messaggio.
>> Controllato da AVG - www.avg.com
>> Versione: 2012.0.1873 / Database dei virus: 2101/4637 - Data di rilascio:
>> 24/11/2011
>>
>>
>>
>
> --
> Alessio Franceschi
> Phd student
> Dipartimento di Scienze Ambientali "G. Sarfatti"
> Università di Siena
> Via P.A. Mattioli, 8 - 53100 Siena (Italy)
> Cell. +393384431806
> email: franceschi6 at unisi.it; alfranceschi at alice.it
>
>
>> GLM.1 <- glm(fledgesucc ~ eggs + elev + hatch + lay + meanterranova +
> + minpengS1 + minpengS2 + nnd + npd + seadist + wchillpengS1 +
> wchillpengS2,
> + family=binomial(logit), data=flege)
>
>> summary(GLM.1)
>
> Call:
> glm(formula = fledgesucc ~ eggs + elev + hatch + lay + meanterranova +
> minpengS1 + minpengS2 + nnd + npd + seadist + wchillpengS1 +
> wchillpengS2, family = binomial(logit), data = flege)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -0.9636 -0.7540 -0.6421 -0.4146 2.2963
>
> Coefficients: (3 not defined because of singularities)
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 16.634984 13.927833 1.194 0.232
> eggs -0.421601 0.575456 -0.733 0.464
> elev 0.009926 0.024849 0.399 0.690
> hatch -0.023411 0.245780 -0.095 0.924
> lay -0.013704 0.245635 -0.056 0.956
> meanterranova 1.438584 1.377566 1.044 0.296
> minpengS1 -0.427313 0.398560 -1.072 0.284
> minpengS2 NA NA NA NA
> nnd -0.034370 0.023913 -1.437 0.151
> npd 0.003597 0.005014 0.717 0.473
> seadist -0.003850 0.004092 -0.941 0.347
> wchillpengS1 NA NA NA NA
> wchillpengS2 NA NA NA NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 159.06 on 151 degrees of freedom
> Residual deviance: 153.85 on 142 degrees of freedom
> (352 observations deleted due to missingness)
> AIC: 173.85
>
> Number of Fisher Scoring iterations: 5
>
>
>
More information about the R-sig-mixed-models
mailing list