[R-sig-ME] Binary response ordering

Douglas Bates bates at stat.wisc.edu
Wed Aug 4 15:44:24 CEST 2010


On Wed, Aug 4, 2010 at 9:30 AM, John Haart <another83 at me.com> wrote:
> Dear Douglas,
>
> Thanks very much for this,
>
>
>> You can
>> disambiguate the process if you convert the response to a factor with
>> the levels specified explicitly.
>
> I am a little unsure what this means?

I just meant that when you create a factor from a numeric variable you
can either accept the default ordering of the factor levels, which is
lexicographic (if all the numeric values are small integers this
corresponds to numeric ordering but as soon as you get numbers like 10
you have to be careful because 10 sorts before 2 in lexicographic
ordering) or you can impose an ordering.

A glm or glmer model fit for family = binomial with the response a
factor with two levels uses the 2nd level as "success" and the first
level as "failure".

> My response is TRUE or FALSE. However there are different levels of TRUE / FALSE. At this point i have not discriminated between them, as i am unsure how, my thought was to convert them to a continuous factor and use this as a response? Instead of having a multi level categorical response which i don't think is possible in lmer?
>
> Whilst on the subject of P-Values,
>
> I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham & Anderson, 2002). However there seems to be a huge difference in my results.

I'll leave it to others to comment on p-values, AIC, etc.
> The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match?
>
> Thanks
>
>
>
>
> On 4 Aug 2010, at 14:15, Douglas Bates wrote:
>
> On Wed, Aug 4, 2010 at 4:54 AM, John Haart <another83 at me.com> wrote:
>> Dear List,
>>
>> I have a quick question regarding the setup of my data for analysis with a glmm.  I hope this is the appropriate list, i apologise if it is not.
>>
>> I have a response variable, TRUE or FALSE. I have coded this as 0 = False and 1 = TRUE in excel.
>>
>> I have 3 categorical factors with C,D and E
>>
>> I then read in the data frame and run the model as follows-
>>
>> lmer(trueorfalse~1+(1|A/B) + C + D+ E ,family=binomial)
>>
>> And this is the output
>>
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: threatornot ~ 1 + (1 | A/B) + C + D+  E ,family=binomial)
>>  AIC  BIC logLik deviance
>>  1410 1450 -696.8     1394
>> Random effects:
>>  Groups       Name        Variance   Std.Dev.
>>  family:order (Intercept) 6.7869e-01 8.2382e-01
>>  order        (Intercept) 7.8204e-11 8.8433e-06
>> Number of obs: 1116, groups: A:B, 43; B, 9
>
> Apparently you altered the output at some point because the factors
> that were named A and B ended up as order and family in the random
> effects description.
>
>> Fixed effects:
>>            Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  0.11281    0.42232   0.267   0.7894
>> C1   -0.02414    0.19964  -0.121   0.9038
>> D2  -0.16482    0.38602  -0.427   0.6694
>> E2       0.95381    0.54316   1.756   0.0791 .
>> E3      0.75733    0.87275   0.868   0.3855
>> E4       0.03044    0.47328   0.064   0.9487
>>
>> What i am unsure about is the inference, if a term is significant does this relate to TRUE or FALSE?
>
> In this case it would be related to the probability of a TRUE response
> but, as this is simply 1 - P(FALSE) then the only change if you
> reversed the order would be to change the signs of the coefficients.
> The simple way to verify this is to fit
>
> glm(threatornot ~ 1)
>
> and check the value of the coefficient.  It should be
> log(pHat/(1-pHat)) where pHat is the proportion of TRUE responses.
>
>> I.E E2 has a p value of 0.079, does this 0.079 relate to the probability of it resulting in a true or false response? Does it matter how i code the input i.e FALSE = 1, TRUE =2 for instance?
>
> If there are two levels in the response then the model is fit
> according to the probability of the second versus the first.  You can
> disambiguate the process if you convert the response to a factor with
> the levels specified explicitly.
>
> The bigger issue is that you shouldn't pay too much attention to a
> particular coefficient related to the levels of a factor like E
> because the coefficients are defined with respect to the contrasts in
> effect at the time the model was fit.  Without knowing the contrasts
> being used and without prior knowledge that a particular contrast was
> important, those coefficients are not important by themselves.  It is
> the cumulative effect of the variability amongst the levels of the
> factor that is important.
>
>> Maybe i am reading the output wrong?
>>
>> Thanks
>>
>> John
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>




More information about the R-sig-mixed-models mailing list