[R] problems with glm

Sun Jan 15 15:57:22 CET 2006

On Sun, 15 Jan 2006, Roland R Regoes wrote:

> 	I am sorry, I should have emphasized that: I really mean 'family =
> binomial(link="log")'.
>
> 	I am fitting infection data. Failure and success correspond to 
> being infected or not. The probability of success (ie, not being 
> infected in my context) is derived from a population model describing 
> the interaction between hosts and parasites:
> 	P(not infected) = exp(-InfectionRate*ParasiteDose)

> 'ParasiteDose' corresponds to 'predictor'. I want to estimate the 
> infection rate. Hence I need 'binomial(link="log")' and must set 
> intercept=0.

Then your model is not appropriate for your data.  The only real evidence 
is coming from the last two points which have approximately equal failure 
rate yet differ by a factor of 5 in the predictor.  Also, your model 
without intercept predicts probablilty one for the first case, and the 
log link in R does not allow probability zero.  If we drop that case
(which contributes nothing to the model) we get

success <- c(12,11,14,14,11,13,11,12)
failure <- c(0,0,0,0,0,0,2,2)
predictor <- 80*5^(0:7)
glm(cbind(success,failure) ~ predictor+0,
            family = binomial(link="log"),
            control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))

which works, albeit with an infinite null deviance (as the null model is 
inappropriate).

> On Mon, Jan 16, 2006 at 12:16:09AM +1100, Bill.Venables at csiro.au wrote:
>> Most of your problems seem to come from 'link = "log"' whereas you
>> probably mean 'link = logit' (which is the default.  Hence:
>>
>> ##########################################
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor,
>> +           family = binomial, #(link="log"),
>> +           control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 7.621991 Iterations - 1
>> Deviance = 6.970934 Iterations - 2
>> Deviance = 6.941054 Iterations - 3
>> Deviance = 6.940945 Iterations - 4
>> Deviance = 6.940945 Iterations - 5
>>
>> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial,      control = glm.control(epsilon = 1e-08, trace = TRUE,
>> maxit = 50))
>>
>> Coefficients:
>> (Intercept)    predictor
>>   4.180e+00   -4.106e-07
>>
>> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
>> Null Deviance:      12.08
>> Residual Deviance: 6.941        AIC: 15.85

>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Roland R Regoes
>> Sent: Sunday, 15 January 2006 11:04 PM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] problems with glm
>>
>>
>> Dear R users,
>>
>>      I am having some problems with glm. The first is an error message
>> "subscript out of bounds". The second is the fact that reasonable
>> starting values are not accepted by the function.
>>
>>      To be more specific, here is an example:
>>
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor + 0,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 3.348039 Iterations - 1
>> Error: subscript out of bounds
>> In addition: Warning message:
>> step size truncated: out of bounds
>>>
>>
>> 	The model with intercept yields:
>>
>>> glm(cbind(success,failure) ~ predictor ,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>>
>> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial(link = "log"),      control = glm.control(epsilon = 1e-08,
>> trace = FALSE, maxit = 50))
>>
>> Coefficients:
>> (Intercept)    predictor
>>  -5.830e-17   -4.000e-08
>>
>> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
>> Null Deviance:	    12.08
>> Residual Deviance: 2.889 	AIC: 11.8
>> There were 33 warnings (use warnings() to see them)
>>> warnings()
>> 1: step size truncated: out of bounds
>> ...
>> 31: step size truncated: out of bounds
>> 32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
>> weights = weights, start = start, etastart = etastart,   ...
>> 33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
>> y = Y, weights = weights, start = start, etastart = etastart,   ...
>>>
>>
>> 	Since the intercept in the above fit is fairly small I thought I
>> could use -4e-8 as a reasonable starting value in a model without
>> intercept. But to no avail:
>>
>>> glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>> Error in glm.fit(x = X, y = Y, weights = weights, start = start,
>> etastart = etastart,  :
>> 	cannot find valid starting values: please specify some
>>>
>>
>> 	I am stuck here. Am I doing something wrong when specifying the
>> starting value? I would appreciate any help. (I could not find anything
>> relevant in the documentation of glm and the mailing list archives, but
>> I did not read the source code of glm yet.)
>>
>> Roland
>>
>>
>> PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
>> 10.4.4
>>
>> --
>> -----------------------------------------------------------------------
>> Roland Regoes
>> Theoretical Biology
>> Universitaetsstr. 16
>> ETH Zentrum, CHN H76.1
>> CH-8092 Zurich, Switzerland
>>
>> tel: +41-44-632-6935
>> fax: +41-44-632-1271
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> -- 
> -----------------------------------------------------------------------
> Roland Regoes
> Theoretical Biology
> Universitaetsstr. 16
> ETH Zentrum, CHN H76.1
> CH-8092 Zurich, Switzerland
>
> tel: +41-44-632-6935
> fax: +41-44-632-1271
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595