[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)

Sun Sep 5 00:53:13 CEST 2010

Hi I know asking which test to use is frowned upon on this list... so 
please do read on for at least a couple on sentences...

I have some multivariate data slit as follows

Tumour Site (one of 5 categories) #
Chemo Schedule (one of 3 cats) ##
Cycle (one of 3 cats*) ##
Dose (one of 3 cats*) #

*These are actually integers but for all our other analysis so far we 
have grouped them into logical bands of categories.

The dependant variable is "Reaction" or "No Reaction"

I have individually analysed each of the independant variables against 
Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## 
produced p values less than 0.05, and those marked # produce p values 
close to 0.05.

We believe that Cycle is the crucial piece of data - the others just 
appear to be different because there are more early cycles in certain 
groups than others.

SO - I believe what I need to do is a Linear Logistic Regression on the 
4 independant variables. And I'm expecting it to show that the tumour 
site, schedule and dose don't matter, only the cycle matters. Done a lot 
of reading and I'm clueless!!

I think I want to do something like:

glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson)

I am then expecting to see some very long output with lots of numbers... 
...my question is TWO fold -

1. is glm the right thing to use before I waste my time

and 2. how do I interpret the result! (I'm kind of expect a lecture here 
as I'm really looking for a nice snappy 'p<0.05 means this variable is 
the one having the influence' type answer and I suspect I'm going to be 
told thats not possible...!

To be clear the example given in the docs is:

>  library(MASS)

>  data(anorexia)

>  anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia)

The output of anorex.1 is:

Call:  glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,      data = anorexia)

Coefficients:

(Intercept)        Prewt    TreatCont      TreatFT

     49.7711      -0.5655      -4.0971       4.5631

Degrees of Freedom: 71 Total (i.e. Null);  68 Residual

Null Deviance:        4525

Residual Deviance: 3311     AIC: 490

and the output of summary(anorex.1) is:

Call:

glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,

     data = anorexia)

Deviance Residuals:

      Min        1Q    Median        3Q       Max

-14.1083   -4.2773   -0.5484    5.4838   15.2922

Coefficients:

             Estimate Std. Error t value Pr(>|t|)

(Intercept)  49.7711    13.3910   3.717 0.000410 ***

Prewt        -0.5655     0.1612  -3.509 0.000803 ***

TreatCont    -4.0971     1.8935  -2.164 0.033999 *

TreatFT       4.5631     2.1333   2.139 0.036035 *

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 48.69504)

     Null deviance: 4525.4  on 71  degrees of freedom

Residual deviance: 3311.3  on 68  degrees of freedom

AIC: 489.97

Number of Fisher Scoring iterations: 2

---
Either can someone point me to a decent place that would explain what 
the means or provide me some pointers? i.e. which of the variables has 
the influence on the outcome in the anorexia data?

Please don't shout!! happy to be pointed to a reference but would prefer 
one in common english not some stats mumbo jumbo!

Calum