[R] what is the difference between the two logistic models?

Thu Aug 13 08:07:15 CEST 2009

As I wrote in my previous email, you need to pick up a methods book that
deals with an introduction to regression analysis.

Using factors in R means using dummy variable coding. 

The coefficients estimated in your model using factors indicate, the effect
of teaching.method = 2 in comparison to the effect of teaching method = 1
and the effect of teaching.method = 3 in comparison to the effect of
teaching method = 3.

Using the linear term, as you do in your second model, is definetely wrong
for teaching.method, unless the teaching.method(s) differ only in the hours
taught. The model says, as teaching method increases by 1, the linear
predictor in the logistic model increases by 0.28. This is obviously bogus
if the difference for the values assigned to the levels of teaching.method
are non-informative about quantitative differences in teaching.method(s). To
give a very plastic example: Say a table is green, red, or brown and you
assign values 1, 2, and 3 to the colors. What to the numbers tell you? -
nothing! The difference between green, red, and brown tables are
qualitative. Therefore, the numeric differences in the coding of the color
variable are non-informative. You cannot use such variables as linear terms
in a regression model. 

In your previous post, it seemed that teaching method is perfectly collinear
with teaching hours. If that is the case, you may want to consider to code
your dummy variable as orthogonal polynomial contrasts. But do so only if
a.) there is no qualitative difference between teaching methods and the only
difference is the quantitative difference in the hours taught and b.) you
are actually able to interpret your model. 

However, I grasp that your understanding of regressions is quite limited.
Therefore, your initial goal should be to build models that you can
understand and interpret.

Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von SNN
Gesendet: Wednesday, August 12, 2009 6:05 PM
An: r-help at r-project.org
Betreff: [R] what is the difference between the two logistic models?

Hi All,

I have data with 400 individuals and the following information
Grade: pass or fail  coded as 1 for pass and 0 for fail
Sex: male or female ( coded as 1 for male and 2 for female ) Age
Teaching.method : can be  1,2,3 

I want to fit a logistic regression where the outcome if (1=pass or 0 for
fail) and the rest of the variables are the regressors. 
My question is that I am not sure when to use “factor” for a variable.

In my example, Grade, sex, teaching method are categorial variables coded as
stated above.
Age is a continuous variable

I have tried the model both ways where in the first model I stick in the
word “factor” in front of the categorial variables, but in this case I do
not know how to interpret the output?

Can someone shed some light on the difference between model1 and model2 and
how to interpret them?

Below is my output

Thanks for your help

Call:
glm(formula = factor(Grade) ~ factor(sex) + age + factor(teaching.method), 
    family = binomial, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8649  -1.1926   0.7494   1.0091   1.6659  

Coefficients:
                                                Estimate Std. Error z value
Pr(>|z|)    
(Intercept)                            -2.77217    0.82182  -3.373 0.000743
***
factor(sex)2                           -0.34751    0.22960  -1.514 0.130140

age                                          0.04544    0.01074   4.230
2.34e-05 ***
factor(teaching.method)  2    -0.07125    0.30123  -0.237 0.813023    
factor(teaching.method)3         0.50058    0.33087   1.513 0.130303    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 465.18  on 344  degrees of freedom Residual deviance:
438.91  on 340  degrees of freedom
AIC: 448.91

Number of Fisher Scoring iterations: 4

> model2<-glm(Grade~ sex + age +teaching.method, 
> family=binomial,data=ndata)
> summary(model2)

Call:
glm(formula = Grade ~ sex + age +teaching.method, family = binomial, 
    data = ndata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7959  -1.2122   0.7547   1.0043   1.5791  

Coefficients:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -2.83988    0.94749  -2.997  0.00272 ** 
sex                        -0.33361    0.22867  -1.459  0.14458    
age                           0.04432    0.01065   4.160 3.18e-05 ***
teaching.method     0.28017    0.16181   1.731  0.08336 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 465.18  on 344  degrees of freedom Residual deviance:
440.85  on 341  degrees of freedom
AIC: 448.85

Number of Fisher Scoring iterations: 4

--
View this message in context:
http://www.nabble.com/what-is-the-difference-between-the-two-logistic-models
--tp24943440p24943440.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.