[R-sig-ME] Multi-level qualitative (fixed-effects) factors

Peter Francis peterfrancis at me.com
Tue Aug 3 12:28:20 CEST 2010


Thanks for this,

I need a random effect, to control for the phylogenetic relationships of the species, they are part of a hierarchical structure. Thats i why i went down the glmm line.

> So how would you separate threatened species from
> non-threatened species based on the model?

I was under the impression that if i modelled threat or not - against traits, the significant traits would relate to those species that are threatened?

The challenge from here is using the "best"  model generated by my training data to test the effectiveness of the model on my test data. I.E how can the model be used in a predictive sense - is it this you feel would be difficult?

Thanks

Peter 
On 3 Aug 2010, at 11:19, ONKELINX, Thierry wrote:

Dear Peter,

You could use glht() from the multcomp package to do a Tukey test or
test several prespecified hypotheses on your model. 

I don't known is fitting a complex model in order to find a 'profile'
for treatened species is such a good idea. The model will return the
probability of a species being threatened depending on the fixed and
random effects. So how would you separate threatened species from
non-threatened species based on the model?

You might want to do some reading upto the difference between GEE and
GLMM. The first provides marginal ('population') estimates. In your case
it is what is happening in the average family. The latter gives
estimates conditional on the random effects. So that is what is
happening after you that the effect that a specific family into account.
I'm not sure which of the two is the most appropriate for your question.

Best regards,

Thierry

PS Please keep the mailing list in cc when replying.

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey


> -----Oorspronkelijk bericht-----
> Van: Peter Francis [mailto:peterfrancis at me.com] 
> Verzonden: dinsdag 3 augustus 2010 11:27
> Aan: ONKELINX, Thierry
> Onderwerp: Re: [R-sig-ME] Multi-level qualitative 
> (fixed-effects) factors
> 
> Dear Thierry,
> 
> I thought that maybe the case! Thank you very much.
> 
> I am relatively new too this field, can a Tukey test be 
> implemented in R and at what point should i do it? Is there a 
> sample workflow?
> 
> i want to be able conclude for example that - species from 
> habitat 1 are more threatened than those from habitat 2 etc. 
> I have multiple factors with multiple levels so by the end i 
> would like a profile of what a threatened species "looks" 
> like compared to a non threatened.
> 
> I am sorry for the questions and appreciate any advice given.
> 
> Peter
> On 3 Aug 2010, at 10:18, ONKELINX, Thierry wrote:
> 
> Dear Peter,
> 
> Your inference in not correct. The significance you get from 
> habitat2 is only the comparison between habitat2 and the 
> reference level (habitat1?). You can conclude that only 
> habitat2 is significantly different from habitat1 and habitat 
> 3 vs 1 and habitat 4 vs 1 is not significant. You cannot 
> compare 2 vs 3, 2 vs 4 and 3 vs 4. That would require 
> multiple comparisons (cfr Tukey).
> 
> Best regards,
> 
> Thierry
> 
> 
> --------------------------------------------------------------
> ----------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek
> team Biometrie & Kwaliteitszorg
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
> 
> Research Institute for Nature and Forest team Biometrics & 
> Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium
> 
> tel. + 32 54/436 185
> Thierry.Onkelinx at inbo.be
> www.inbo.be
> 
> To call in the statistician after the experiment is done may 
> be no more than asking him to perform a post-mortem 
> examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
> 
> The plural of anecdote is not data.
> ~ Roger Brinner
> 
> The combination of some data and an aching desire for an 
> answer does not ensure that a reasonable answer can be 
> extracted from a given body of data.
> ~ John Tukey
> 
> 
>> -----Oorspronkelijk bericht-----
>> Van: r-sig-mixed-models-bounces at r-project.org
>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens 
> Peter Francis
>> Verzonden: dinsdag 3 augustus 2010 10:53
>> Aan: R-mixed models mailing list
>> Onderwerp: [R-sig-ME] Multi-level qualitative 
> (fixed-effects) factors
>> 
>> Sorry to everyone,
>> 
>> I guess i should rephrase this completely as i think i am 
> not getting 
>> across what i want to ( my fault, i apologise for the spam!).
>> 
>> I am trying to discover which traits correlate with 
> threatened or not 
>> (binary response- i do have different levele of threat but 
> am unsure 
>> how to model continuos response variables in GLMM - quasi poisson?)
>> 
>> I shall work through a quick example and would appreciate it if you 
>> could tell me if my conclusions are justified.
>> 
>> Lets just say for ease i am have two models -  i can send
>> str(traits) etc if required -
>> 
>>> model1 <-lmer(threatornot~1+(1|a/b) + habit, family=binomial)
>>> 
>>> Generalized linear mixed model fit by the Laplace approximation
>>> Formula: threatornot ~ 1 + (1 | a/b) + habit  AIC  BIC
>> logLik deviance
>>> 1406 1436 -696.9     1394
>>> Random effects:
>>> Groups       Name        Variance   Std.Dev.  
>>> family:order (Intercept) 6.9892e-01 8.3602e-01
>>> order        (Intercept) 4.2292e-14 2.0565e-07
>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>> 
>>> Fixed effects:
>>>          Estimate Std. Error z value Pr(>|z|)   
>>> (Intercept) -0.04803    0.19174  -0.250  0.80219   
>>> habit2       1.10627    0.41607   2.659  0.00784 **
>>> habit3       0.92578    0.78141   1.185  0.23611   
>>> habit4       0.14383    0.38477   0.374  0.70856   
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> 
>> 
>> This shows that with habit the model has a AIC value of 1406 and 
>> habit2 has the significant effect on the threat status of a species?
>> 
>> Now model 2
>>> model2 <-lmer(threatornot~1+(1|a/b) + habit +
>> breedingsystem, family=binomial)
>>> Generalized linear mixed model fit by the Laplace approximation
>>> Formula: threatornot ~ 1 + (1 | a/b) + habit + breedingsystem AIC  
>>> BIC logLik deviance
>>> 1408 1449 -696.2     1392
>>> Random effects:
>>> Groups       Name        Variance Std.Dev.
>>> family:order (Intercept) 0.67836  0.82363 
>>> order        (Intercept) 0.00000  0.00000
>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>> 
>>> Fixed effects:
>>>              Estimate Std. Error z value Pr(>|z|)   
>>> (Intercept)      0.17398    0.39138   0.444  0.65665   
>>> habit2           1.15687    0.41943   2.758  0.00581 **
>>> habit3           0.92017    0.78158   1.177  0.23907   
>>> habit4          -0.01673    0.43150  -0.039  0.96907   
>>> breedingsystem2 -0.18615    0.37849  -0.492  0.62285   
>>> breedingsystem3 -0.42513    0.40652  -1.046  0.29567   
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> 
>> 
>> This shows that adding breeding system to the model doesn't fit the 
>> data any better (the AIC is higher), therefore 
> breedingsystem has no 
>> significant effect on the threat status of a species
>> 
>> Under this simple example i can conclude that the habit of 
> a species 
>> is significant in determining how threatened the species is and in 
>> particular species from habit2 are more threatened than 
> those not from 
>> habit2.
>> 
>> Are these inferences justified or am i missing something?
>> 
>> Thanks
>> 
>> 
>> 
>> 
>> On 3 Aug 2010, at 08:50, Andrew Dolman wrote:
>> 
>> I don't get it. How can you fit the model with just 1 of 
> three levels 
>> of factor "habitat" and have the same number of 
> observations as when 
>> you run the model with all three? (It must have at least 2 
> levels to 
>> fit anyway) Also, in the first example you have 4 levels of habitat.
>> 
>> Are they different levels of habitat resolution? e.g.
>> 
>> Aquatic - non aquatic
>> Aquatic - Terrestrial - Epiphytic
>> Aquatic - Terrestrial - Epiphytic - Up elephant's noses
>> 
>> 
>> Please read the posting guide and include proper examples 
> of what you 
>> are doing and what the data look like.
>> 
>> 
>> andydolman at gmail.com
>> 
>> 
>> 
>> On 3 August 2010 08:48, Peter Francis <peterfrancis at me.com> wrote:
>>> Hi David and Ben, thanks for your help -
>>> 
>>> I was worried this would make little sense!
>>> 
>>> I have set out my candidate models
>>> 
>>> A+B+C+D
>>> B+C+D
>>> A+C+D
>>> etc etc
>>> 
>>> And am running through them in lmer. Factor A for instance
>> is Habit, which takes 3 forms - aquatic, terrestrial or epiphyte.
>>> 
>>> When i run the model with A as a factor i get the breakdown
>> of the individual levels habitat 1, habitat 2 and habitat 3 and a 
>> corresponding AIC score. However if i just run it with habitat 3 - 
>> aquatic - i get a lower AIC score, therefore the model fits 
> the data 
>> better?
>>> 
>>> I am unsure how to, without splitting my factors into their
>> constituent levels at the beginning - A1+A2+A3 + B1 + B2 
> etc, arrive 
>> at the model with the lowest AIC?
>>> 
>>> Thanks
>>> 
>>> Peter
>>> 
>>> On 3 Aug 2010, at 00:16, David Duffy wrote:
>>> 
>>> On Mon, 2 Aug 2010, Peter Francis wrote:
>>> 
>>>> I have many multi level factors i.e habit - aquatic,
>> terrestrial, epiphyte etc
>>>> 
>>>> I ran the model with habit as a factor
>>>> 
>>>>> model111 <-lmer(threatornot~1+(1|a/b) + habit, family=binomial)
>>>> 
>>>>> Generalized linear mixed model fit by the Laplace approximation
>>>>> Formula: threatornot ~ 1 + (1 | order/family) + habit AIC  BIC 
>>>>> logLik deviance
>>>>> 1406 1436 -696.9     1394
>>>>> Random effects:
>>>>> Groups       Name        Variance   Std.Dev.
>>>>> family:order (Intercept) 6.9892e-01 8.3602e-01
>>>>> order        (Intercept) 4.2292e-14 2.0565e-07
>>>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>>>> 
>>>>> Fixed effects:
>>>>>         Estimate Std. Error z value Pr(>|z|)
>>>>> (Intercept) -0.04803    0.19174  -0.250  0.80219
>>>>> habit2       1.10627    0.41607   2.659  0.00784 **
>>>>> habit3       0.92578    0.78141   1.185  0.23611
>>>>> habit4       0.14383    0.38477   0.374  0.70856
>>>> 
>>>> ---
>>>> Which had a AIC of 1406
>>>> 
>>>> I then re-ran the model with only aquatic and got a lower
>> AIC value - which i guess is to be expected as aquatic is highly 
>> significant and aquatic species are more prone to threat ( my 
>> response).
>>>> 
>>>> 
>>>>>> model112 <-lmer(threatornot~1+(1|a/b) + aquatic, 
> family=binomial)
>>>>>> model112
>>>>> Generalized linear mixed model fit by the Laplace approximation
>>>>> Formula: threatornot ~ 1 + (1 | order/family) + aquatic AIC  BIC 
>>>>> logLik deviance
>>>>> 1395 1415 -693.4     1387
>>>>> Random effects:
>>>>> Groups       Name        Variance Std.Dev.
>>>>> family:order (Intercept) 0.60007  0.77464
>>>>> order        (Intercept) 0.00000  0.00000
>>>>> Number of obs: 1116, groups: family:order, 43; order, 9
>>>>> 
>>>>> Fixed effects:
>>>>>         Estimate Std. Error z value Pr(>|z|)
>>>>> (Intercept)   0.1572     0.1827   0.860 0.389613
>>>>> aquatic      -0.6683     0.1737  -3.847 0.000119 ***
>>>> 
>>>> My question is - when i developed the candidate models i
>> thought using multilevel factors would be OK and i would be able to 
>> tease out the individual levels. If i split the factors into levels 
>> from the beginning then i am left with a huge amount of candidate 
>> models? This would not be a problem in stepwise regression 
> as i could 
>> just remove the habit with the least significant P Value.
>>>> 
>>>> If i remove habits i "feel" are unimportant from the
>> beginning i feel i would be limiting the model too much.
>>>> 
>>>> I hope this makes sense!
>>> 
>>> I don't understand at all, I'm afraid.  Is aquatic the same
>> as habit=2, or something?  If so, there is something funny 
> about the 
>> model fits.
>>> 
>>> If family and order are "nuisance" variables, then a
>> stepwise approach is quite reasonable (if you are someone 
> who thinks 
>> stepwise regression is reasonable, of course ;)).
>>> 
>>> Just 2c, David Duffy.
>>> 
>>> --
>>> | David Duffy (MBBS PhD)                                   
>>    ,-_|\
>>> | email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: 
>> -0101  /     *
>>> | Epidemiology Unit, Queensland Institute of Medical
>> Research   \_,-._/
>>> | 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG
>> 4D0B994A v
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> Druk dit bericht a.u.b. niet onnodig af.
> Please do not print this message unnecessarily.
> 
> Dit bericht en eventuele bijlagen geven enkel de visie van de 
> schrijver weer en binden het INBO onder geen enkel beding, 
> zolang dit bericht niet bevestigd is door een geldig 
> ondertekend document. The views expressed in  this message 
> and any annex are purely those of the writer and may not be 
> regarded as stating an official position of INBO, as long as 
> the message is not confirmed by a duly signed document.
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.




More information about the R-sig-mixed-models mailing list