[R] fitting a logistic regression with mixed type of variabl
(Ted Harding)
Ted.Harding at manchester.ac.uk
Mon Nov 16 21:14:55 CET 2009
On 16-Nov-09 19:22:10, Jack Luo wrote:
> Hi,
> I am trying to fit a logistic regression using glm, but my
> explanatory variables are of mixed type: some are numeric,
> some are ordinal, some are categorical, say
>
> If x1 is numeric, x2 is ordinal, x3 is categorical, is the
> following formula OK?
>
> model <- glm(y~x1+x2+x3, family=binomial(link="logit"),
> na.action=na.pass)
>
> Thanks,
> -Jack
Speaking rather generally (the details will depend on the nature
of your variables, and of what you want to find out), the formula
itself is OK.
What *is* important is to define your variables so as to respect
their nature, so that the regression can handle them appropriately.
For the quantitative variable x1, there should be no problem;
you can leave it as it is (though in some applications a transform
of it, such as log(x1) or sqrt(x1) may be better, of course).
For the categorical variable x3, this should be treated as a factor
whose levels are the categories. If the categories are represented
alphabetically in the data (e.g. the values of x3 are "A","B","C")
then x3 will be converted into a factor when the data are read in.
Then it is only a matter of specifying what system of contrasts
you want (see below).
However, if the values of x3 are represented numerically (e.g. 1,2,3)
then x3 should be explicitly converted into a factor:
x3 <- factor(x3)
with possible additional argument depending on whether you want to
consider the levels as ordered. You should use ordered=TRUE if you
want x3 to be treated as an ordered factor, ordered=FALSE if unordered:
x3 <- factor(x3,ordered=TRUE)
x3 <- factor(x3,ordered=FALSE)
In the case that x3 was read in as a factor in the first place, you
may still want to apply tghe above in order to force ordering or
non-ordering. Read ?factor for more detail.
Then there is the question of contrasts for x3. For unordered factors,
probably "treatment contrasts" (which compare each level of the factor
with a reference) may be most appropriate. For ordered factors, you
may want to use either Helmert contrasts or "successive difference"
contrasts.
For "treatment contrasts" use
contrasts(x3) <- contr.treatment(N)
where n is the number of levels of x3 (see ?contrasts).
For Helmert contrasts, similarly
contrasts(x3) <- contr.helmert(n)
For "successive difference" contrasts, there is a function contr.diffe
which, however, is not available in the standard packages. However,
there is a contr.diff() on package Epi, and an implementation is also
developed in the MASS book. In that case
contrasts(x3) <- contr.diff(n)
Now for the ordinal variable, x2. Attitudes differ, in different
applications, as to whether to use such a variable as if it were a
numerical variable or as an ordered factor. If it can be considered
meaningful to treat the ordered values as if they were numerical
measure (i.e. the difference between x2=1 and x2=2 can be considered
as effectively equivalent to the difference between x2=2 and x2=3,
etc.) then it can be meaningful to simply treat x2 on the same
footing as x1.
On the other hand, you may only want to go as far as treating x2
as if it were an ordered factor, in which case you can treat it
on the same lines as x3 above.
However, an ordinal variable is often treated as if it were the
index of a subdivision of a latent continuum. For example, a
question might ask the respondent if he is "Strongly against",
"Somewhat against" "Indifferent", "Somewhat in favour" "Strongly
in favour" some proposal. This forces the respondent to decide
which of these categories "best" represents their inner intensity
of attitude towards the issue, which is the latent continuum.
Such things can be treated by methods which fit latent variables
to ordered responses, but this goes beyond what can be represented
in a simple linear model such as you give above.
I prefer to leave others, who really know about such things, to
advise on how to proceed in such a case!
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Nov-09 Time: 20:14:52
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list