[R] binomial glm warnings revisited

Wed Oct 8 19:54:37 CEST 2003

      This seems to me to be a special case of the general problem of a 
parameter on a boundary.  Another example is the case of a variance 
component that is zero.  For this latter problem, Pinhiero and Bates 
(2000) Mixed-Effects Models in S and S-Plus (Springer, sec. 2.4.1) 
present simulation results showing that a 50-50 mixture of chi-square(0) 
and chi-square(1), for example, provide an excellent approximation to 
the actual sampling distribution of the 2*log(likelihood ratio). 

      Recent discussions of this and related questions on this list and 
elsewhere produced the following list of articles that may be helpful: 

      Donald Andrews (2001) "Testing When a Parameter In on the Boundary 
of the Maintained Hypothesis", Econometrica, 69:  683-734.

      Donald Andrews (2000) "Inconsistency of the Bootstrap When a 
Parameter Is on the Boundary of the Parameter Space", Econometrica, 68:  
388-405.

      Donald Andrews (1999) "Estimation When a Parameter Is on a 
Boundary", Econometrica, 67:  1341-1383.

      Rousseeuw, P. J. and Christmann, A. (2003) Robustness against 
separations
and outliers in logistic regression, Computational Statistics & Data
Analysis, Vol. 43, pp. 315-332

      ### Unfortunately, I have not had time to review these, so I can't 
comment further. 

      hope this helps.  spencer graves

Tord Snall wrote:

>Dear all,
>
>Last autumn there was some discussion on the list of the warning
>Warning message: 
>fitted probabilities numerically 0 or 1 occurred in: (if
>(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
>
>when fitting binomial GLMs with many 0 and few 1.
>
>Parts of replies:
>"You should be able to tell which coefficients are infinite -- the
>coefficients and their standard errors will be large. When this happens the
>standard errors and the p-values reported by summary.glm() for those
>variables are useless."
>"My guess is that the deviances and coefficients are entirely ok. I'd
>expect that problems in the general area that Thomas mentions to reveal
>themselves as a failure to converge."
>
>I have this problem with my data. In a GLM, I have 269 zeroes and only 1 one:
>
>summary(dbh)
>Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
>(Intercept)   0.1659     3.8781   0.043    0.966
>dbh          -0.5872     0.5320  -1.104    0.270
>
>  
>
>>drop1(dbh, test = "Chisq")
>>    
>>
>Single term deletions
>Model:
>MPext ~ dbh
>       Df Deviance     AIC     LRT Pr(Chi)  
><none>      9.9168 13.9168                  
>dbh     1  13.1931 15.1931  3.2763 0.07029 .
>
>I now wonder, is the drop1() function output 'reliable'?
>
>If so, is then the estimates from MASS confint() also 'reliable'? It gives
>the same warning.
>
>Waiting for profiling to be done...
>                2.5 %      97.5 %
>(Intercept) -6.503472 -0.77470556
>abund       -1.962549 -0.07496205
>There were 20 warnings (use warnings() to see them)
>
>
>Thanks in advance for your reply.
>
>
>Sincerely,
>Tord
>
>
>
>
>-----------------------------------------------------------------------
>Tord Snäll
>Avd. f växtekologi, Evolutionsbiologiskt centrum, Uppsala universitet
>Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University
>Villavägen 14			
>SE-752 36 Uppsala, Sweden
>Tel: 018-471 28 82 (int +46 18 471 28 82) (work)
>Tel: 018-25 71 33 (int +46 18 25 71 33) (home)
>Fax: 018-55 34 19 (int +46 18 55 34 19) (work)
>E-mail: Tord.Snall at ebc.uu.se
>Check this: http://www.vaxtbio.uu.se/resfold/snall.htm!
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>  
>