[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)

Peng, C cpeng.usm at gmail.com
Sun Sep 5 15:32:55 CEST 2010



Calum-4 wrote:
> 
> Hi I know asking which test to use is frowned upon on this list... so 
> please do read on for at least a couple on sentences...
> 
> I have some multivariate data slit as follows
> 
> Tumour Site (one of 5 categories) #
> Chemo Schedule (one of 3 cats) ##
> Cycle (one of 3 cats*) ##
> Dose (one of 3 cats*) #
> 
> *These are actually integers but for all our other analysis so far we 
> have grouped them into logical bands of categories.
> 
> The dependant variable is "Reaction" or "No Reaction"
> 
> I have individually analysed each of the independant variables against 
> Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## 
> produced p values less than 0.05, and those marked # produce p values 
> close to 0.05.
> 
> We believe that Cycle is the crucial piece of data - the others just 
> appear to be different because there are more early cycles in certain 
> groups than others.
> 
> SO - I believe what I need to do is a Linear Logistic Regression on the 
> 4 independant variables. And I'm expecting it to show that the tumour 
> site, schedule and dose don't matter, only the cycle matters. Done a lot 
> of reading and I'm clueless!!
> 
> I think I want to do something like:
> 
> glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson)
> =========================
> Comment  1: If you stick to Linear Logistic Regression, the family should
> be "binomial" assuming that reaction has only two values (Yes/No).
> "family=poisson" should be used when the response is a frequency count
> such as the number of tumors.
> =========================
> 
> I am then expecting to see some very long output with lots of numbers... 
> ...my question is TWO fold -
> 
> 1. is glm the right thing to use before I waste my time
> 
> and 2. how do I interpret the result! (I'm kind of expect a lecture here 
> as I'm really looking for a nice snappy 'p<0.05 means this variable is 
> the one having the influence' type answer and I suspect I'm going to be 
> told thats not possible...!
> ================================================================
> Comment 2: The regression coefficients in binary logistic regression
> models are called log-odds ratio. The interpretation of odds ratio can be
> tricky but the p-value is interpreted in the usual way.
> ================================================================
> To be clear the example given in the docs is:
> 
>>  library(MASS)
> 
>>  data(anorexia)
> 
>>  anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family =
>> gaussian, data = anorexia)
> 
> ===================================
> Comment 3. Here Postwt is a continuous variable. The specification "family
> = gaussian" assumes the that Postwt is a normal variable, therefore, the
> fitted model is the ordinary normal linear regression model.
> ===================================
> 
> The output of anorex.1 is:
> 
> Call:  glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family =
> gaussian,      data = anorexia)
> 
> Coefficients:
> 
> (Intercept)        Prewt    TreatCont      TreatFT
> 
>      49.7711      -0.5655      -4.0971       4.5631
> 
> Degrees of Freedom: 71 Total (i.e. Null);  68 Residual
> 
> Null Deviance:        4525
> 
> Residual Deviance: 3311     AIC: 490
> 
> 
> 
> and the output of summary(anorex.1) is:
> 
> Call:
> 
> glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,
> 
>      data = anorexia)
> 
> Deviance Residuals:
> 
>       Min        1Q    Median        3Q       Max
> 
> -14.1083   -4.2773   -0.5484    5.4838   15.2922
> 
> Coefficients:
> 
>              Estimate Std. Error t value Pr(>|t|)
> 
> (Intercept)  49.7711    13.3910   3.717 0.000410 ***
> 
> Prewt        -0.5655     0.1612  -3.509 0.000803 ***
> 
> TreatCont    -4.0971     1.8935  -2.164 0.033999 *
> 
> TreatFT       4.5631     2.1333   2.139 0.036035 *
> 
> ---
> 
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for gaussian family taken to be 48.69504)
> 
>      Null deviance: 4525.4  on 71  degrees of freedom
> 
> Residual deviance: 3311.3  on 68  degrees of freedom
> 
> AIC: 489.97
> 
> Number of Fisher Scoring iterations: 2
> 
> 
> 
> ---
> Either can someone point me to a decent place that would explain what 
> the means or provide me some pointers? i.e. which of the variables has 
> the influence on the outcome in the anorexia data?
> 
> Please don't shout!! happy to be pointed to a reference but would prefer 
> one in common english not some stats mumbo jumbo!
> 
> Calum
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://r.789695.n4.nabble.com/non-zero-exit-status-error-when-install-GenomeGraphs-tp2526950p2527317.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list