[R-sig-ME] Dummy variables in Factors with more than 2 levels

Wed May 21 16:35:07 CEST 2008

On Wed, May 21, 2008 at 5:16 AM, Martin Henry H. Stevens
<HStevens at muohio.edu> wrote:
> By default, R uses the 'opposite' approach: the intercept is the mean of the
> first level, and the other parameters of the differences between the first
> level and that level. See ?contrasts
> Hank
> On May 21, 2008, at 5:59 AM, carlos ramirez wrote:
>>
>>
>> Hi All,
>>
>> Sorry to bother with a
>> basic question.
>>
>> I was wondering how R
>> manages dummy variables when computing factors with more than 2 levels.
>> For
>> instance in my study I have the variable 'stress' with 3 levels
>> ('pre-tonic', 'tonic',
>> and 'pos-tonic' coded, '1', 2' and '3' respectively).
>>
>> Programs such as SPSS transform
>> nominal and ordinal categories into sets of dichotomies ( dummy variables)
>> in
>> such a way that a computed dummy variable 1 (dummy pre-tonic) will assign
>> 1 to
>> all pre-tonic stress and '0' to all the others. Dummy variable 2 (dummy
>> tonic)
>> assigns '1' to all tonic data and '0' to the rest. By default SPSS leaves
>> the
>> last level as the 'reference category' (in this case post-tonic) for
>> comparison.
>> Using what is called the 'indicator contrast'. Thus, the coding ends up
>> being
>> something like the example below
>>
>>
>>
>> ------------------------------------------------
>>  Dummy variables              Value     Coding                         (1)
>>  (2)Stress                  1     1.000   .000                  2      .000
>>  1.000                  3      .000   .000
>>
>>
>>
>> Thus, in the outcome,  Beta (B) and Exp (B) do not present the odds
>> ratio of the dependent variable in relation to the independent variable
>> but odds
>> ratio of the dummy variables with respect to the reference category
>> (post-tonic
>> in this case).
>>
>>
>>
>>  When I run the mix log model in R I get an
>> outpost like the following.
>>
>>
>>
>> Generalized linear
>> mixed model fit using Laplace
>>
>> Formula: Identif ~ (1
>> | Subj) + (1 | Item) + Place +  Stress
>> +      Voicing
>>
>>   Data: idcrg1
>>
>>  Family: binomial(logit link)
>>
>>  AIC
>> BIC logLik deviance
>>
>>  1163 1211 -572.6     1145
>>
>> Random effects:
>>
>>  Groups Name        Variance Std.Dev.
>>
>>  Subj
>> (Intercept) 0.63178  0.79485
>>
>>  Item
>> (Intercept) 0.88192  0.93910
>>
>> number of obs: 1476,
>> groups: Subj, 41; Item, 36
>>
>>
>>
>>  Estimated scale
>> (compare to  1 )  0.888108
>>
>>
>>
>>  Fixed effects:
>>
>>            Estimate Std. Error z value Pr(>|z|)
>>
>> (Intercept)   1.9920
>> 0.4948   4.026 5.67e-05 ***
>>
>> Place2       -0.7253     0.4376
>> -1.658   0.0974 .
>>
>> Place3       -0.1389     0.4478
>> -0.310   0.7565
>>
>> Stress2       0.8765     0.4493
>> 1.951   0.0511 .
>>
>> Stress3      -0.2386     0.4298
>> -0.555   0.5788
>>
>> Voicing2      0.6937
>> 0.3601   1.927   0.0540 .
>>
>>
>> ---
>>
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
>> 1
>>
>>
>>
>>
>> Correlation of Fixed
>> Effects:
>>
>>         (Intr) Place2 Place3  Strss2 Strss3
>>
>> Place2   -0.466
>>
>> Place3   -0.447
>> 0.511
>>
>> Stress2  -0.426 -0.035 -0.002  0.026
>>
>> Stress3  -0.451
>> 0.004 -0.006  0.017  0.485
>>
>>
>> Voicing2 -0.356  0.004 -0.023
>> 0.008  0.020  0.011
>>
>>
>>
>>
>>
>>
>> Based on the index
>> that appears on Stress in the Fixed Effects outcome (Stress2 and Stress3;
>> same
>> for Place2 and Place3) .
>>
>> Am I correct to assume
>> that the reference category in this case was the first level and not the
>> last
>> as it is done in SPSS?
>>
>> Does R create dummy
>> variables to calculate the regression?
>>
>>
>>
>>
>> Thanks for your time. I'd
>> appreciate any help you could provide.
>>
>> Sincerely,Carlos
>>
>>
>> _________________________________________________________________
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> <ATT00001.txt>

In R the terminology is that variables expressed as factors
(categorical data) or ordered factors (ordered categorical data) are
converted to a set of contrasts when incorporated in a linear or
generalized linear model.  The default behavior is to use the
"treatment" contrasts.  You can set an option to use the "SAS"
contrasts where the last level is the reference level.  Try

options(contrasts = c(unordered = "contr.SAS", ordered = "contr.poly")

then refit your model.