[R] Categorical Response Query

Tue Oct 21 19:21:10 CEST 2008

The second case also needs the argument: weight=n
Then all 3 models should give the same general fit (same coefficients, same predicted values).

The differences are subtle and may not be of interest.  Conceptually think about:  did you run 10 trials under a set of conditions (age=x, sex=y, class=z) and 9 of them were successes? This is model 2/3.  Or did you run a bunch of individual trials and just by chance 10 of them happened to have the same conditions (age=x, sex=y, class=z) and 9 of those 10 were successes? This is model 1.

The biggest visible difference is in the deviance calculations.  That comes about because in model 1 the saturated model can fit every point exactly (since the responses are all 0 or 1), in the other 2 the saturated model gives the same proportion for each combination of predictors as observed, but these are not 0/1 now.

The most important difference comes when you decide to extend the model, (mixed effects, bootstrapping) because the observational unit is different between model 1 and models 2 & 3 (I don't know of any differences between 2 & 3 other than looks/convenience).

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of andyer weng
> Sent: Monday, October 20, 2008 4:39 PM
> To: r-help at r-project.org
> Subject: Re: [R] Categorical Response Query
>
> Hi all,
>
> I have a queston about Categorical response.
>
> i have a data frame containing age, sex, class, success(1=success,
> 0=non sucess).
> age, sex,class are the explantory variables, and sucess is the
> response variable.  and i can get n (the nunber of times each age
> occurs) and r (the number of sucess of that age).
>
> when I try to creat the regression relationship for these variables, I
> have seen many different cases, i just wonder which one fits me the
> best for this situation.
>
> 1st case,
> xxx.glm<-glm(success~age*sex*class,family=binomial, data=xxx.data)
>
> 2nd case
>
> xxx.glm<-glm(r/n~age*sex*class,family=binomial, data=xxx.data)
>
> 3rd case
>
> xxx.glm<-glm(cbind(r,n-r)~age*sex*class,family=binomial, data=xxx.data)
>
> what is difference between the above 3 cases? which one is the best to
> use?
>
> if Ii don't group the data, can I use the 1st case. if i group the
> data, can i use 2nd or 3rd case?
>
> please advise.
>
> Cheers.
> Andyer
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.