[R] Can't find the error in a Binomial GLM I am doing, please help
peter dalgaard
pdalgd at gmail.com
Mon May 7 22:49:23 CEST 2012
On May 7, 2012, at 19:39 , Bert Gunter wrote:
> 1. As this is a statistical, rather than an R issue, you would do
> better posting on a statistical help site like stats.stackexchange.com
> (although some generous soul here may respond).
>
> 2. You would also probably do better consulting with a local
> statistical resource if available, as it is difficult to explain such
> issues remotely.
>
> Cheers,
> Bert
Well, in this case it is pretty obvious that there are no positive outcomes in the 1996 cohort (log odds essentially -Inf); as this is the base level, the Wald tests are completely unreliable (Hauck-Donner effect). The LRT is 1369-1283=86 which is quite consistent with chisq.test().
-pd
>
> On Mon, May 7, 2012 at 10:05 AM, lincoln <miseno77 at hotmail.com> wrote:
>> Hi all,
>>
>> I can't find the error in the binomial GLM I have done. I want to use that
>> because there are more than one explanatory variables (all categorical) and
>> a binary response variable.
>> This is how my data set looks like:
>>> str(data)
>> 'data.frame': 1004 obs. of 5 variables:
>> $ site : int 0 0 0 0 0 0 0 0 0 0 ...
>> $ sex : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...
>> $ age : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>> $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11
>> 11 ...
>> $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ...
>>
>> I know that, particularly for one level of variable "cohort" (2004 value),
>> it should be a strong effect of variable "cohort" on variable "site" so I do
>> a Chi square test that confirms the null hypothesis there is a difference in
>> sites on the way "cohort" is distributed:
>>
>>> (chisq.test(data$site,data$cohort))
>>
>> Pearson's Chi-squared test
>>
>> data: data$site and data$cohort
>> X-squared = 82.6016, df = 10, *p-value = 1.549e-13*
>>
>> Mensajes de aviso perdidos
>> In chisq.test(data$site, data$cohort) :
>> Chi-squared approximation may be incorrect
>>
>>
>>
>>
>> After that, I have tried to use a binomial GLM with all the explanatory
>> variables but I couldn't find any significance of any variable, neither
>> cohort, and for this reason I tried to use only cohort as predictor and I
>> get this:
>>
>>
>>> BinomialGlm <- glm(site ~ cohort, data=data,binomial)
>>> summary(BinomialGlm)
>>
>> Call:
>> glm(formula = site ~ cohort, family = binomial, data = data)
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -1.9239 -0.9365 -0.9365 1.3584 1.6651
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -12.57 324.74 -0.039 0.969
>> cohort2000 11.47 324.75 0.035 0.972
>> cohort2001 13.82 324.74 0.043 0.966
>> cohort2002 12.97 324.74 0.040 0.968
>> cohort2003 13.66 324.74 0.042 0.966
>> *cohort2004 14.25 324.74 0.044 0.965*
>> cohort2006 12.21 324.74 0.038 0.970
>> cohort2007 11.81 324.74 0.036 0.971
>> cohort2008 12.41 324.74 0.038 0.970
>> cohort2009 12.15 324.74 0.037 0.970
>> cohort2010 11.97 324.74 0.037 0.971
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 1369.3 on 1003 degrees of freedom
>> Residual deviance: 1283.7 on 993 degrees of freedom
>> AIC: 1305.7
>>
>> Number of Fisher Scoring iterations: 11
>>
>>
>>
>>
>> I tired to use simple GLM (gaussian family) and I get results that are more
>> logicals:
>>
>>> GaussGlm <- glm(site ~ cohort, data=data)
>>> summary(GaussGlm)
>>
>> Call:
>> glm(formula = site ~ cohort, data = data)
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -0.8429 -0.3550 -0.3550 0.6025 0.7500
>>
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 5.740e-14 4.762e-01 0.000 1.0000
>> cohort2000 2.500e-01 5.324e-01 0.470 0.6388
>> cohort2001 7.778e-01 5.020e-01 1.549 0.1216
>> cohort2002 6.000e-01 4.880e-01 1.230 0.2192
>> cohort2003 7.500e-01 4.861e-01 1.543 0.1231
>> *cohort2004 8.429e-01 4.796e-01 1.757 0.0792 .*
>> cohort2006 4.118e-01 4.832e-01 0.852 0.3943
>> cohort2007 3.204e-01 4.785e-01 0.670 0.5033
>> cohort2008 4.600e-01 4.786e-01 0.961 0.3367
>> cohort2009 3.975e-01 4.772e-01 0.833 0.4051
>> cohort2010 3.550e-01 4.768e-01 0.745 0.4567
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> (Dispersion parameter for gaussian family taken to be 0.2267955)
>>
>> Null deviance: 245.40 on 1003 degrees of freedom
>> Residual deviance: 225.21 on 993 degrees of freedom
>> AIC: 1372.5
>>
>> Number of Fisher Scoring iterations: 2
>>
>>
>>
>> What is going on? Any suggestion/commentary?
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list