[R-sig-ME] Multi-level qualitative (fixed-effects) factors

Tue Aug 3 10:53:20 CEST 2010

Sorry to everyone,

I guess i should rephrase this completely as i think i am not getting across what i want to ( my fault, i apologise for the spam!).

I am trying to discover which traits correlate with threatened or not (binary response- i do have different levele of threat but am unsure how to model continuos response variables in GLMM - quasi poisson?)

I shall work through a quick example and would appreciate it if you could tell me if my conclusions are justified.

Lets just say for ease i am have two models -  i can send str(traits) etc if required - 

> model1 <-lmer(threatornot~1+(1|a/b) + habit, family=binomial)
> 
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: threatornot ~ 1 + (1 | a/b) + habit 
>  AIC  BIC logLik deviance
> 1406 1436 -696.9     1394
> Random effects:
> Groups       Name        Variance   Std.Dev.  
> family:order (Intercept) 6.9892e-01 8.3602e-01
> order        (Intercept) 4.2292e-14 2.0565e-07
> Number of obs: 1116, groups: family:order, 43; order, 9
> 
> Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)   
> (Intercept) -0.04803    0.19174  -0.250  0.80219   
> habit2       1.10627    0.41607   2.659  0.00784 **
> habit3       0.92578    0.78141   1.185  0.23611   
> habit4       0.14383    0.38477   0.374  0.70856   
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This shows that with habit the model has a AIC value of 1406 and habit2 has the significant effect on the threat status of a species?

Now model 2
> model2 <-lmer(threatornot~1+(1|a/b) + habit + breedingsystem, family=binomial)
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: threatornot ~ 1 + (1 | a/b) + habit + breedingsystem 
>  AIC  BIC logLik deviance
> 1408 1449 -696.2     1392
> Random effects:
> Groups       Name        Variance Std.Dev.
> family:order (Intercept) 0.67836  0.82363 
> order        (Intercept) 0.00000  0.00000 
> Number of obs: 1116, groups: family:order, 43; order, 9
> 
> Fixed effects:
>                Estimate Std. Error z value Pr(>|z|)   
> (Intercept)      0.17398    0.39138   0.444  0.65665   
> habit2           1.15687    0.41943   2.758  0.00581 **
> habit3           0.92017    0.78158   1.177  0.23907   
> habit4          -0.01673    0.43150  -0.039  0.96907   
> breedingsystem2 -0.18615    0.37849  -0.492  0.62285   
> breedingsystem3 -0.42513    0.40652  -1.046  0.29567   
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This shows that adding breeding system to the model doesn't fit the data any better (the AIC is higher), therefore breedingsystem has no significant effect on the threat status of a species

Under this simple example i can conclude that the habit of a species is significant in determining how threatened the species is and in particular species from habit2 are more threatened than those not from habit2.

Are these inferences justified or am i missing something?

Thanks

On 3 Aug 2010, at 08:50, Andrew Dolman wrote:

I don't get it. How can you fit the model with just 1 of three levels
of factor "habitat" and have the same number of observations as when
you run the model with all three? (It must have at least 2 levels to
fit anyway) Also, in the first example you have 4 levels of habitat.

Are they different levels of habitat resolution? e.g.

Aquatic - non aquatic
Aquatic - Terrestrial - Epiphytic
Aquatic - Terrestrial - Epiphytic - Up elephant's noses

Please read the posting guide and include proper examples of what you
are doing and what the data look like.

andydolman at gmail.com

On 3 August 2010 08:48, Peter Francis <peterfrancis at me.com> wrote:
> Hi David and Ben, thanks for your help -
> 
> I was worried this would make little sense!
> 
> I have set out my candidate models
> 
> A+B+C+D
> B+C+D
> A+C+D
> etc etc
> 
> And am running through them in lmer. Factor A for instance is Habit, which takes 3 forms - aquatic, terrestrial or epiphyte.
> 
> When i run the model with A as a factor i get the breakdown of the individual levels habitat 1, habitat 2 and habitat 3 and a corresponding AIC score. However if i just run it with habitat 3 - aquatic - i get a lower AIC score, therefore the model fits the data better?
> 
> I am unsure how to, without splitting my factors into their constituent levels at the beginning - A1+A2+A3 + B1 + B2 etc, arrive at the model with the lowest AIC?
> 
> Thanks
> 
> Peter
> 
> On 3 Aug 2010, at 00:16, David Duffy wrote:
> 
> On Mon, 2 Aug 2010, Peter Francis wrote:
> 
>> I have many multi level factors i.e habit - aquatic, terrestrial, epiphyte etc
>> 
>> I ran the model with habit as a factor
>> 
>>> model111 <-lmer(threatornot~1+(1|a/b) + habit, family=binomial)
>> 
>>> Generalized linear mixed model fit by the Laplace approximation
>>> Formula: threatornot ~ 1 + (1 | order/family) + habit
>>> AIC  BIC logLik deviance
>>> 1406 1436 -696.9     1394
>>> Random effects:
>>> Groups       Name        Variance   Std.Dev.
>>> family:order (Intercept) 6.9892e-01 8.3602e-01
>>> order        (Intercept) 4.2292e-14 2.0565e-07
>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>> 
>>> Fixed effects:
>>>           Estimate Std. Error z value Pr(>|z|)
>>> (Intercept) -0.04803    0.19174  -0.250  0.80219
>>> habit2       1.10627    0.41607   2.659  0.00784 **
>>> habit3       0.92578    0.78141   1.185  0.23611
>>> habit4       0.14383    0.38477   0.374  0.70856
>> 
>> ---
>> Which had a AIC of 1406
>> 
>> I then re-ran the model with only aquatic and got a lower AIC value - which i guess is to be expected as aquatic is highly significant and aquatic species are more prone to threat ( my response).
>> 
>> 
>>>> model112 <-lmer(threatornot~1+(1|a/b) + aquatic, family=binomial)
>>>> model112
>>> Generalized linear mixed model fit by the Laplace approximation
>>> Formula: threatornot ~ 1 + (1 | order/family) + aquatic
>>> AIC  BIC logLik deviance
>>> 1395 1415 -693.4     1387
>>> Random effects:
>>> Groups       Name        Variance Std.Dev.
>>> family:order (Intercept) 0.60007  0.77464
>>> order        (Intercept) 0.00000  0.00000
>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>> 
>>> Fixed effects:
>>>           Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)   0.1572     0.1827   0.860 0.389613
>>> aquatic      -0.6683     0.1737  -3.847 0.000119 ***
>> 
>> My question is - when i developed the candidate models i thought using multilevel factors would be OK and i would be able to tease out the individual levels. If i split the factors into levels from the beginning then i am left with a huge amount of candidate models? This would not be a problem in stepwise regression as i could just remove the habit with the least significant P Value.
>> 
>> If i remove habits i "feel" are unimportant from the beginning i feel i would be limiting the model too much.
>> 
>> I hope this makes sense!
> 
> I don't understand at all, I'm afraid.  Is aquatic the same as habit=2, or something?  If so, there is something funny about the model fits.
> 
> If family and order are "nuisance" variables, then a stepwise approach is quite reasonable (if you are someone who thinks stepwise regression is reasonable, of course ;)).
> 
> Just 2c, David Duffy.
> 
> --
> | David Duffy (MBBS PhD)                                         ,-_|\
> | email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
> | Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
> | 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>