[R] A problem in a glm model

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 9 00:05:42 CEST 2003


You need to look up the Hauck-Donner phenomenon in MASS (4th, 3rd or 2nd 
edition).

In short, Wald tests of binomial or Poisson glms are highly unreliable:
a moderate p-value indicates no effect or a very large effect.

I suspect your model is in fact partially separable (that is can fit parts
of the data exactly), since those are large coefficients for indicator 
variables.  Try reducing the tolerance in glm.control (add epsilon=1e-10) 
and see if the coefficients change a lot.

On Thu, 8 May 2003, Simona Avanzo wrote:

> Hallo all, 
> 
> I have the following glm model:
> 
> f1 <- as.formula(paste("factor(y.fondi)~",
>                   "flgsess + segmeta2 + udm + zona.geo + ultimo.prod.", 
>                   "+flg.a2 + flg.d.na2 + flg.v2 + flg.cc2",
>                   " +(flg.a1 + flg.d.na1 + flg.v1 + flg.cc1)^2",
>                   " + flg.a2:flg.d.na2 + flg.a2:flg.v2 + flg.a2:flg.cc2",
>                   " + flg.d.na2:flg.v2 + flg.v2:flg.cc2",
>                  sep=""))
> 
> g1 <- glm(f1,family=binomial,data=camp.lavoro.meno.na)
> 
> The variables are all factors:
> ·	y.fondi takes value 0 or 1; 
> ·	flgsess has 2 levels;
> ·	segmeta2 has 4 levels;
> ·	udm has 6 levels;
> ·	zona.geo has 5 levels;
> ·	ultimo.prod. has 4 levels;
> ·	flg.a1, flg.d.na1, flg.v1, flg.cc1, flg.a2, flg.d.na2,  flg.v2, flg.cc2  are 8 factors that take values 0 or 1.
> 
> The number of observations is 1390. 
> The observations with "y.fondi = 1" are 259.
> The observations with "y.fondi = 0" are 1131.
>  
> The summary of the model is:
> > summary(g1)
> Call:
> glm(formula = f1, family = binomial, data = camp.lavoro.meno.na)
> 
> Deviance Residuals: 
>     Min       1Q   Median       3Q      Max  
> -2.8955  -0.3586  -0.2692  -0.1642   2.9133  
> 
> Coefficients:
>                                    Estimate    Std. Error  z value   Pr(>|z|)    
> (Intercept)                    -2.7647     0.7523     -3.675    0.000238 ***
> ...                                      ...           ...              ...              ...        
> 
> flg.a21                           0.7898      0.4948     1.596     0.110475    
> flg.d.na21                      0.2097      0.7336     0.286     0.774963    
> flg.v21                           0.3928      0.5257     0.747     0.454994    
> flg.cc21                         -0.8547      1.4954    -0.572     0.567625    
> flg.a11                           0.7051      0.4889     1.442     0.149221    
> flg.d.na11                       1.3582     0.5429     2.502     0.012353 *  
> flg.v11                            2.2596     0.5079     4.449     8.62e-06 ***
> flg.cc11                          -3.3658     8.5259    -0.395     0.693014    
> flg.a21:flg.d.na21           -6.9392     26.5432  -0.261     0.793760    
> flg.a21:flg.v21                -1.4355     4.0963    -0.350    0.726005    
> flg.a21:flg.cc21               -6.0460    72.4807    -0.083    0.933521    
> flg.d.na21:flg.v21            -2.4347     2.9045    -0.838    0.401888    
> flg.v21:flg.cc21                11.7232   72.4814     0.162    0.871510    
> flg.a11:flg.d.na11            -8.3843    30.4660    -0.275   0.783162 !!!!    
> flg.a11:flg.v11                  6.5067    39.2569     0.166   0.868356    
> flg.a11:flg.cc11                 13.5596   19.4693    0.696   0.486140  !!!!  
> flg.d.na11:flg.v11            -0.7143     1.2673     -0.564   0.573013    
> flg.d.na11:flg.cc11            12.0653   15.3880     0.784   0.432997    
> flg.v11:flg.cc11                  6.2648    8.5808      0.730  0. 465331  !!!!  
> 
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 1336.79  on 1389  degrees of freedom
> Residual deviance:  576.08  on 1354  degrees of freedom
> AIC: 648.08
> 
> Number of Fisher Scoring iterations: 8
> 
> If  I apply the test anova, I obtain:
> 
> > g1.1 <- update(g1,~.-flg.a1:flg.d.na1,data=camp.lavoro.meno.na)
> > anova(g1.1,g1,test="Chisq")
> Analysis of Deviance Table
>   Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
> 1      1355     578.49                        
> 2      1354     576.08    1     2.41      0.12
> 
> > g1.1 <- update(g1,~.-flg.a1:flg.cc1,data=camp.lavoro.meno.na)
> > anova(g1.1,g1,test="Chisq")
> Analysis of Deviance Table
>   Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
> 1      1355     580.77                        
> 2      1354     576.08    1     4.69      0.03
> 
> > g1.1 <- update(g1,~.-flg.v1:flg.cc1,data=camp.lavoro.meno.na)
> > anova(g1.1,g1,test="Chisq")
> Analysis of Deviance Table
>   Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
> 1      1355     578.01                        
> 2      1354     576.08    1     1.94      0.16
> 
> Why I obtain these differences?
> Many thanks for any help, 
> 
> Simona
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list