[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
stats at wittongilbert.free-online.co.uk
stats at wittongilbert.free-online.co.uk
Sun Sep 5 00:53:13 CEST 2010
Hi I know asking which test to use is frowned upon on this list... so
please do read on for at least a couple on sentences...
I have some multivariate data slit as follows
Tumour Site (one of 5 categories) #
Chemo Schedule (one of 3 cats) ##
Cycle (one of 3 cats*) ##
Dose (one of 3 cats*) #
*These are actually integers but for all our other analysis so far we
have grouped them into logical bands of categories.
The dependant variable is "Reaction" or "No Reaction"
I have individually analysed each of the independant variables against
Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ##
produced p values less than 0.05, and those marked # produce p values
close to 0.05.
We believe that Cycle is the crucial piece of data - the others just
appear to be different because there are more early cycles in certain
groups than others.
SO - I believe what I need to do is a Linear Logistic Regression on the
4 independant variables. And I'm expecting it to show that the tumour
site, schedule and dose don't matter, only the cycle matters. Done a lot
of reading and I'm clueless!!
I think I want to do something like:
glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson)
I am then expecting to see some very long output with lots of numbers...
...my question is TWO fold -
1. is glm the right thing to use before I waste my time
and 2. how do I interpret the result! (I'm kind of expect a lecture here
as I'm really looking for a nice snappy 'p<0.05 means this variable is
the one having the influence' type answer and I suspect I'm going to be
told thats not possible...!
To be clear the example given in the docs is:
> library(MASS)
> data(anorexia)
> anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia)
The output of anorex.1 is:
Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia)
Coefficients:
(Intercept) Prewt TreatCont TreatFT
49.7711 -0.5655 -4.0971 4.5631
Degrees of Freedom: 71 Total (i.e. Null); 68 Residual
Null Deviance: 4525
Residual Deviance: 3311 AIC: 490
and the output of summary(anorex.1) is:
Call:
glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,
data = anorexia)
Deviance Residuals:
Min 1Q Median 3Q Max
-14.1083 -4.2773 -0.5484 5.4838 15.2922
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.7711 13.3910 3.717 0.000410 ***
Prewt -0.5655 0.1612 -3.509 0.000803 ***
TreatCont -4.0971 1.8935 -2.164 0.033999 *
TreatFT 4.5631 2.1333 2.139 0.036035 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 48.69504)
Null deviance: 4525.4 on 71 degrees of freedom
Residual deviance: 3311.3 on 68 degrees of freedom
AIC: 489.97
Number of Fisher Scoring iterations: 2
---
Either can someone point me to a decent place that would explain what
the means or provide me some pointers? i.e. which of the variables has
the influence on the outcome in the anorexia data?
Please don't shout!! happy to be pointed to a reference but would prefer
one in common english not some stats mumbo jumbo!
Calum
More information about the R-help
mailing list