[R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

Berwin A Turlach berwin at maths.uwa.edu.au
Tue Mar 30 18:52:38 CEST 2010

G'day all,

On Tue, 30 Mar 2010 16:19:46 +0100
Corrado <ct529 at york.ac.uk> wrote:

> David Winsemius wrote:
> > A) It is not an error, only a warning. Wouldn't it seem reasonable
> > to issue such a warning if you have data that violates the
> > distributional assumptions?
> I am not questioning the approach. I am only trying to understand why
> a (rather expensive) source of documentation and the behaviour of a 
> function are not aligned.

1) Also expensive books have typos in them.
2) glm() is from a package that is part of R and the author of this
   book is AFAIK not a member of R core, hence has no control on
   whether his documentation and the behaviour of a function are
   a) If he were documenting a function that was part of a package he
      wrote as support for his book, as some authors do, there might be
      a reason to complain.  But then 1) would still apply.
   b) Even books written by members of R core have occasionally
      misalignments between the behaviour of a function and the
      documentation contained in such books.  This can be due to them
      documenting a function over whose implementation they do not have
      control (e.g. a function in a contributed package) or the fact
      that R is improving/changing from version to version while books
      are rather static.

For these reasons it is always worthwhile to check the errata page for
a book, if such exists.

The source of the warning is due to the fact that you do not provide
all necessary information about your response.  If your response is
binomial (with a mean depended on some explanatory variables), then
each response consists of two numbers, the number of trials and the
number of success.  If you calculate the observed proportion of
successes from these two numbers and feed this into glm as the
response, you are omitting necessary information.  In this case, you
should provide the number of trials on which each proportion is based
as prior weights.  For example:

R> x <- seq(from=-1,to=1,length=41)
R> px <- exp(x)/(1+exp(x))
R> nn <- sample(8:12, 41, replace=TRUE)
R> yy <- rbinom(41, size=nn, prob=px)
R> y <- yy/nn
R> glm(y~x, family=binomial, weights=nn)

Call:  glm(formula = y ~ x, family = binomial, weights = nn) 

(Intercept)            x  
      0.246        1.124  

Degrees of Freedom: 40 Total (i.e. Null);  39 Residual
Null Deviance:	    91.49 
Residual Deviance: 50.83 	AIC: 157.6 
R> glm(y~x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial) 

(Intercept)            x  
     0.2143       1.1152  

Degrees of Freedom: 40 Total (i.e. Null);  39 Residual
Null Deviance:	    9.256 
Residual Deviance: 5.229 	AIC: 49.87 
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!




========================== Full address ============================
Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin

More information about the R-help mailing list