[R] logistic regression with nominal predictors

Tue Sep 20 06:59:23 CEST 2005

	  This sounds to me like a great research project that could be 
answered relatively easily with a Monte Carlo study.  An execllent 
mathematician might be able to produce simple theoretical limits on the 
error from using ranks or normal scores, but such limits would likely be 
much wider than one could get in a typical case using Monte Carlo.  And 
Monte Carlo in R is normally fairly easy even for quite complicated 
situations.

	  spencer graves

Ramón Casero Cañas wrote:

> (Sorry for obvious mistakes, as I am quite a newby with no Statistics
> background).
> 
> My question is going to be what is the gain of logistic regression over
> odds ratios when none of the input variables is continuous.
> 
> 
> My experiment:
>  Outcome: ordinal scale, ``quality'' (QUA=1,2,3)
>  Predictors: ``segment'' (SEG) and ``stress'' (STR). SEG is
>              nominal scale with 24 levels, and STR is dychotomous (0,1).
> 
> 
> 
> Considering the outcome continuous, two-way ANOVA with
> 
> aov(as.integer(QUA) ~ SEG * STR)
> 
> doesn't find evidence of interaction between SEG and STR, and they are
> significant on their own. This is the result that we would expect from
> clinical knowledge.
> 
> 
> 
> I use
> 
> xtabs(~QUA+SEG, data=data2.df, subset=STR==0)
> xtabs(~QUA+SEG, data=data2.df, subset=STR==0)
> 
> for the contingency tables. There are zero cells, and for some values of
> SEG, there is only one none-zero cell, i.e. some values of SEG determine
> the output with certainty.
> 
> So initially I was thinking of a proportional odds logistic regression
> model, but following Hosmer and Lemeshow [1], zero cells are
> problematic. So I take out of the data table the deterministic values of
> SEG, and I pool QUA=2 and QUA=3, and now I have a dychotomous outcome
> (QUA = Good/Bad) and no zero cells.
> 
> The following model doesn't find evidence of interaction
> 
> glm(QUA ~ STR * SEG, data=data3.df, family=binomial)
> 
> so I go for
> 
> glm(QUA ~ STR + SEG, data=data3.df, family=binomial)
> 
> 
> (I suppose that what glm does is to create design variables for SEG,
> where 0 0 ... 0 is for the first value of SEG, 1 0 ... 0 for the second
> value, 0 1 0 ... 0 for the third, etc).
> 
> Coefficients:
>               Estimate Std. Error   z value Pr(>|z|)
> (Intercept) -1.085e+00  1.933e-01    -5.614 1.98e-08 ***
> STR.L        2.112e-01  6.373e-02     3.314 0.000921 ***
> SEGP2C.MI   -9.869e-01  3.286e-01    -3.004 0.002669 **
> SEGP2C.AI   -1.306e+00  3.585e-01    -3.644 0.000269 ***
> SEGP2C.AA   -1.743e+00  4.123e-01    -4.227 2.37e-05 ***
> [shortened]
> SEGP4C.ML   -5.657e-01  2.990e-01    -1.892 0.058485 .
> SEGP4C.BL   -2.908e-16  2.734e-01 -1.06e-15 1.000000
> SEGSAX.MS    1.092e-01  2.700e-01     0.405 0.685772
> SEGSAX.MAS  -5.441e-16  2.734e-01 -1.99e-15 1.000000
> SEGSAX.MA    7.130e-01  2.582e-01     2.761 0.005758 **
> SEGSAX.ML    1.199e+00  2.565e-01     4.674 2.96e-06 ***
> SEGSAX.MP    1.313e+00  2.570e-01     5.108 3.26e-07 ***
> SEGSAX.MI    8.865e-01  2.569e-01     3.451 0.000558 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 3462.0  on 3123  degrees of freedom
> Residual deviance: 3012.6  on 3101  degrees of freedom
> AIC: 3058.6
> 
> Number of Fisher Scoring iterations: 6
> 
> 
> Even though some coefficients have no evidence of statistical
> significance, the model requires them from a clinical point of view.
> 
> At this point, the question would be how to interpret these results, and
> what advantage they offer over odds ratios. From [1] I can understand
> that in the case of a dychotomous and a continuous predictor, you can
> adjust for the continuous variable.
> 
> But when all predictors are dychotomous (due to the design variables), I
> don't quite see the effect of adjustment. Wouldn't it be better just to
> split the data in two groups (STR=0 and STR=1), and instead of using
> logistic regression, use odds ratios for each value of SEG?
> 
> Cheers,
> 
> Ramón.
> 
> [1] D.W. Hosmer and S. Lemeshow. ``Applied Logistic Regression''.
> John-Wiley. 2000.
> 

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915