[R] Odds Ratio and Logistic Regression

Michael Dewey info at aghmed.fsnet.co.uk
Mon Dec 31 15:24:37 CET 2012


At 18:14 30/12/2012, Lorenzo Isella wrote:
>Dear All,
>I am learning the ropes about logistic regression in R.
>I found some interesting examples
>
>http://bit.ly/Vq4GgX
>http://bit.ly/W9fUTg
>http://bit.ly/UfK73e
>
>but I am a bit lost.
>I have several questions.
>1) For instance, what is the difference between
>
>glm.out = glm(response ~ poverty + gender, family=binomial(logit),
>   data=mydata)
>
>and
>
>glm.out = glm(response ~ poverty * gender, family=binomial(logit),
>   data=mydata)
>? Which begs the question when I should use the "*" or "+" sign when doing
>a logistic regression on several explanatory variables. I think that in
>the former case I am allowing for an interaction between poverty and
>gender, but I would like to be sure about it.

I think you need to (re)-read any introductory 
text on R, in particular about the use of 
formulae. The asterisk implies an interaction. 
This also answers your second question I think.


>2) Consider the following snippet
>
>
>glm.out = glm(response ~ poverty + gender, family=binomial(logit),
>   data=mydata)
>
>where "response" is a dichotomous variable, poverty assumes only two
>values (Above poverty line and Below poverty line) and gender assumes only
>the Male or Female values.
>The command above leads to the following output
>#######################################
>print(summary(glm.out))
>Call:
>glm(formula = response ~ poverty + gender, family = binomial(logit),
>     data = mydata)
>
>Deviance Residuals:
>     Min       1Q   Median       3Q      Max
>-2.2094   0.4269   0.4269   0.8033   1.1911
>
>Coefficients:
>                           Estimate Std. Error z value Pr(>|z|)
>(Intercept)                 0.9656     0.1477   6.538 6.25e-11 ***
>povertyBelow poverty line  -0.9978     0.3246  -3.074  0.00211 **
>genderFEMALE                1.3840     0.2549   5.429 5.68e-08 ***
>---
>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 
>‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 494.81  on 499  degrees of freedom
>Residual deviance: 457.13  on 497  degrees of freedom
>AIC: 463.13
>
>Number of Fisher Scoring iterations: 4
>##############################################
>
>To calculate then odds ratios, I should do the following
>
>exp(coef(glm.out))
>               (Intercept) povertyBelow poverty line
>genderFEMALE
>                 2.6263831                 0.3687033
>3.9909627
>
>but here I am lost about the interpretation. For instance, what are the
>odds of a positive response for those above versus below the poverty line
>in males? In females?
>
>I think that everything is there, but I cannot extract/interpret the info
>provided by R correctly.
>Any help is appreciated.
>Cheers
>
>Lorenzo
>
>

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html




More information about the R-help mailing list