[R] Help with GLM starting values in user defined link function
Andrew.Hoskins at csiro.au
Andrew.Hoskins at csiro.au
Fri Oct 24 05:10:30 CEST 2014
Hi Ken,
Many thanks for your advice.
Earlier this morning I stepped my way through the glm.fit function to see where things were falling over and realised that first and foremost I had my link function wrong (link and inverse were back to front). I've now fixed this and can get the model to produce coefficients following your example. However, once I try to fit the complete model (where eco is offset rather than estimated), I still end up with an error that the model was unable to fit coefficients. Although it does seem to make it though some iterations of the IRLS algorithm before this happens. I also end up with the same error when I first estimate the coefficient for eco and then try to offset the estimated eco coefficient*eco, which seems to suggest to me that there is something not right with how the link is working with offset.
See below for the example code.
I was wondering if you or anyone else has some more great advice?
Thanks in advance.
Andrew
## Example fit of negative exponential binomial GLM
## define link function
negexp <- function()
{
linkfun <- function(mu) -log(1-mu)
linkinv <- function(eta) 1-exp(-eta)
mu.eta <- function(eta) exp(-eta)
valideta <- function(eta) all(is.finite(eta))
link <- paste0("negexp")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
## create some data
y <- c(0,0,0,0,0,1,1,1,0,1,0,1,1,0,0,1,1,1,1,0,1,0,1,0,1,0,1,1,1,0,1,0,1,0,0,1,0
,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,1,0,1,0,1,1,0,1,0,1,0,1,1
,0,1,0,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,1,0,1,0,1,0,0)
eco <- c(0.30406146,0.77127034,0.27507740,0.34660849,0.10496959,0.87483283,1.34652163,0.59570289,0.76557945
,1.07105129,0.85681582,1.15885519,0.62478718,0.82327890,0.61921331,1.40337615,1.69376337,0.96662890
,0.62756558,1.22148480,0.29509170,1.20822702,1.04490241,0.63994034,0.44537652,0.80908805,1.38793219
,0.68987695,0.65253113,0.10996619,2.18030035,0.95187860,1.91719194,0.55910638,0.42246265,0.99747093
,0.65609015,1.56408171,0.09024976,0.49430176,0.89651639,0.13943031,0.72264673,0.33212781,0.53156567
,0.24478163,0.20439708,0.26577897,0.73061755,1.41380646,0.45361391,0.53961802,0.20099582,0.16278695
,0.51188479,1.23152701,1.45180489,0.16136045,0.84696597,1.06556860,0.31352700,7.54728452,0.47765713
,1.62966928,0.51514442,0.87203787,0.33515181,0.71407043,0.84767445,0.33640927,0.70331392,0.41617675
,1.41137914,0.22586531,0.92797131,1.34627407,0.21408341,0.38903027,0.91690877,0.95946623,0.46114617
,0.62965571,7.50492235,1.96516642,0.61555184,1.24061426,0.95281453,1.02729643,1.44581350,1.63148077
,1.02291891,0.80319545,0.92136436,1.22428318,0.59172977,1.56985149,0.35790202,2.23402940,0.98565537
,0.41658919)
geog <- c(254.94615,277.78675,3.69047,47.92363,0.90241,449.44532,1795.89910,985.66843,1063.47287
,160.27883,58.72738,1093.00270,1423.51194,1232.16769,54.56121,4772.54353,1877.95322,1110.18161
,174.53805,2829.67281,551.22870,1781.67608,495.13007,44.42326,1057.72959,783.99003,3025.58558
,1855.59219,1715.27590,41.75478,3687.95693,2125.34324,751.42284,1598.04527,1625.35627,848.40949
,1484.40835,3332.15554,214.99678,136.60188,1388.07919,77.21198,2366.56327,617.31749,1421.72213
,537.38636,223.57615,256.35456,3022.63678,4783.64718,45.97153,194.79090,2.65647,69.08392
,948.66990,1480.70503,2805.30205,144.55345,1134.92666,728.04570,1421.45250,827.57959,1517.75701
,682.77014,1060.09369,448.44398,848.64842,1437.74925,2887.23135,56.28056,725.78408,91.19194
,1905.87208,749.92265,261.19075,2529.80027,371.16338,1130.14904,802.23348,1851.86105,1274.20599
,260.79728,1427.11459,3891.82373,482.58143,2011.86414,1310.10546,975.37470,1087.50127,2195.28667
,2358.70761,44.82955,1553.35558,2261.60567,1216.64486,1674.70189,165.13405,1463.93362,1542.33074
,1683.01992)
testDat <- data.frame(y,eco,geog)
## fit model
fit.logit <- glm(y~eco+geog,family=binomial(cloglog),data=testDat)
##EV <- negexp()$linkfun(fitted(fit.logit)) ## linking the fitted values doesn't work but ignoring this will
LE.lm <- lm(EV ~ eco + geog, testDat)
Ec <- coef(LE.lm)
glm(y ~ eco + geog, family = binomial(negexp()),data=testDat,start=Ec)
## fit model using offset
fit.logit <- glm(y~offset(eco)+geog,family=binomial(cloglog),data=testDat)
##EV <- negexp()$linkfun(fitted(fit.logit)) ## linking the fitted values doesn't work but ignoring this will
LE.lm <- lm(EV ~ offset(eco) + geog, testDat)
Ec <- coef(LE.lm)
glm(y ~ offset(eco) + geog, family = binomial(negexp()),data=testDat,start=Ec)
## fit model using offset with eco*coefficient * eco
fit.logit <- glm(y~offset(coef(fit2)[2]*eco)+geog,family=binomial(cloglog),data=testDat)
##EV <- negexp()$linkfun(fitted(fit.logit)) ## linking the fitted values doesn't work but ignoring this will
LE.lm <- lm(EV ~ offset(coef(fit2)[2]*eco) + geog, testDat)
Ec <- coef(LE.lm)
fit2 <- glm(y ~ offset(coef(fit2)[2]*eco) + geog, family = binomial(negexp()),data=testDat,start=Ec)
-----Original Message-----
<Andrew.Hoskins <at> csiro.au> writes:
> I'm trying to fit a binomial GLM with user defined
link
function (negative exponential), however I seem to
> be unable to find the correct starting values to
initialise such a model. I've tried taking starting
> values from a logistic and log models fit to the
same data and also tried to substitute the intercept
from
> the null model in as the starting value for this
model, however all continue to return the same error.
>
> Andrew
>
> ## Example fit of negative exponential binomial
GLM
>
> ## define link function
> negexp <- function()
> {
> linkfun <- function(mu) 1-exp(-mu)
> linkinv <- function(eta) -log(1-eta)
> mu.eta <- function(eta) 1/(1-eta)
> valideta <- function(eta) TRUE
> link <- paste0("negexp")
> structure(list(linkfun = linkfun, linkinv = linkinv,
> mu.eta = mu.eta, valideta = valideta,
... [show rest of quote]
name = link),
> class = "link-glm")
> }
>
---SNIP---
Take a look at the limits of eta for the extreme values
of mu and compare them with the linear predictor of
your link applied to say the fitted values of your logit
fit. It seems to suggest that two values fall outside
the range of valid eta, according to your linkfun:
c(62, 83). I got it to work
with these removed although there were lots of other
warnings that you might have to worry about.
Also, when choosing start values you might want
to base them on a fit with your link rather than
a different one. So, I got start values by trying
EV <- negexp()$linkfun(fitted(fit.logit))
LE.lm <- lm(EV ~ eco + geog, testDat)
Ec <- coef(LE.lm)
with these defined as in your mail (sorry I snipped
your code out).
So, I found
which(fitted(LE.lm) > (1 - exp(-1)))
62 83
62 83
and then
glm(y ~ eco + geog, family = binomial(negexp()),
data = testDat[-c(62, 83), ], start = Ec)
Coefficients:
(Intercept) eco geog
1.593e-01 2.085e-01 4.713e-06
Degrees of Freedom: 97 Total (i.e. Null); 95 Residual
Null Deviance: 134.4
Residual Deviance: 112.3 AIC: 118.3
There were 27 warnings (use warnings() to see them)
HTH
>
> Andrew Hoskins
> Postdoctoral reasearch fellow
> Ecosystem Sciences
> CSIRO
>
> E Andrew.Hoskins <at> csiro.au T +61 2 6246 5902
> Black Mountain Laboratories
> Clunies Ross Street, Acton, ACT 2601, Australia
> www.csiro.au
>
... [show rest of quote]
>
--
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html
More information about the R-help
mailing list