[R] Can't find the error in a Binomial GLM I am doing, please help

peter dalgaard pdalgd at gmail.com
Mon May 7 22:49:23 CEST 2012


On May 7, 2012, at 19:39 , Bert Gunter wrote:

> 1. As this is a statistical, rather than an R issue, you would do
> better posting on a statistical help site like stats.stackexchange.com
> (although some generous soul here may respond).
> 
> 2. You would also probably do better consulting with a local
> statistical resource if available, as it is difficult to explain such
> issues remotely.
> 
> Cheers,
> Bert

Well, in this case it is pretty obvious that there are no positive outcomes in the 1996 cohort (log odds essentially -Inf); as this is the base level, the Wald tests are completely unreliable (Hauck-Donner effect). The LRT is 1369-1283=86 which is quite consistent with chisq.test().

-pd


> 
> On Mon, May 7, 2012 at 10:05 AM, lincoln <miseno77 at hotmail.com> wrote:
>> Hi all,
>> 
>> I can't find the error in the binomial GLM I have done. I want to use that
>> because there are more than one explanatory variables (all categorical) and
>> a binary response variable.
>> This is how my data set looks like:
>>> str(data)
>> 'data.frame':   1004 obs. of  5 variables:
>>  $ site  : int  0 0 0 0 0 0 0 0 0 0 ...
>>  $ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...
>>  $ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11
>> 11 ...
>>  $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ...
>> 
>> I know that, particularly for one level of variable "cohort" (2004 value),
>> it should be a strong effect of variable "cohort" on variable "site" so I do
>> a Chi square test that confirms the null hypothesis there is a difference in
>> sites on the way "cohort" is distributed:
>> 
>>> (chisq.test(data$site,data$cohort))
>> 
>>        Pearson's Chi-squared test
>> 
>> data:  data$site and data$cohort
>> X-squared = 82.6016, df = 10, *p-value = 1.549e-13*
>> 
>> Mensajes de aviso perdidos
>> In chisq.test(data$site, data$cohort) :
>>  Chi-squared approximation may be incorrect
>> 
>> 
>> 
>> 
>> After that, I have tried to use a binomial GLM with all the explanatory
>> variables but I couldn't find any significance of any variable, neither
>> cohort, and for this reason I tried to use only cohort as predictor and I
>> get this:
>> 
>> 
>>> BinomialGlm <- glm(site ~  cohort, data=data,binomial)
>>> summary(BinomialGlm)
>> 
>> Call:
>> glm(formula = site ~ cohort, family = binomial, data = data)
>> 
>> Deviance Residuals:
>>    Min       1Q   Median       3Q      Max
>> -1.9239  -0.9365  -0.9365   1.3584   1.6651
>> 
>> Coefficients:
>>            Estimate Std. Error z value Pr(>|z|)
>> (Intercept)   -12.57     324.74  -0.039    0.969
>> cohort2000     11.47     324.75   0.035    0.972
>> cohort2001     13.82     324.74   0.043    0.966
>> cohort2002     12.97     324.74   0.040    0.968
>> cohort2003     13.66     324.74   0.042    0.966
>> *cohort2004     14.25     324.74   0.044    0.965*
>> cohort2006     12.21     324.74   0.038    0.970
>> cohort2007     11.81     324.74   0.036    0.971
>> cohort2008     12.41     324.74   0.038    0.970
>> cohort2009     12.15     324.74   0.037    0.970
>> cohort2010     11.97     324.74   0.037    0.971
>> 
>> (Dispersion parameter for binomial family taken to be 1)
>> 
>>    Null deviance: 1369.3  on 1003  degrees of freedom
>> Residual deviance: 1283.7  on  993  degrees of freedom
>> AIC: 1305.7
>> 
>> Number of Fisher Scoring iterations: 11
>> 
>> 
>> 
>> 
>> I tired to use simple GLM (gaussian family) and I get results that are more
>> logicals:
>> 
>>> GaussGlm <- glm(site ~  cohort, data=data)
>>> summary(GaussGlm)
>> 
>> Call:
>> glm(formula = site ~ cohort, data = data)
>> 
>> Deviance Residuals:
>>    Min       1Q   Median       3Q      Max
>> -0.8429  -0.3550  -0.3550   0.6025   0.7500
>> 
>> Coefficients:
>>             Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 5.740e-14  4.762e-01   0.000   1.0000
>> cohort2000  2.500e-01  5.324e-01   0.470   0.6388
>> cohort2001  7.778e-01  5.020e-01   1.549   0.1216
>> cohort2002  6.000e-01  4.880e-01   1.230   0.2192
>> cohort2003  7.500e-01  4.861e-01   1.543   0.1231
>> *cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .*
>> cohort2006  4.118e-01  4.832e-01   0.852   0.3943
>> cohort2007  3.204e-01  4.785e-01   0.670   0.5033
>> cohort2008  4.600e-01  4.786e-01   0.961   0.3367
>> cohort2009  3.975e-01  4.772e-01   0.833   0.4051
>> cohort2010  3.550e-01  4.768e-01   0.745   0.4567
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
>> (Dispersion parameter for gaussian family taken to be 0.2267955)
>> 
>>    Null deviance: 245.40  on 1003  degrees of freedom
>> Residual deviance: 225.21  on  993  degrees of freedom
>> AIC: 1372.5
>> 
>> Number of Fisher Scoring iterations: 2
>> 
>> 
>> 
>> What is going on? Any suggestion/commentary?
>> 
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list