[R] second try; writing user-defined GLM link function

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Apr 18 21:58:34 CEST 2006


A couple more comments:

The null deviance is wrong here, as the code assumes that the link
function maps constant vectors to constant vectors, which it does not 
here.  You can circumvent that by setting an offset.

Even setting dispersion = 1 I get slightly different se's.

Here's a more robust way to specify the link:

logexp <- function(days = 1)
{
     linkfun <- function(mu) qlogis(mu^(1/days))
     linkinv <- function(eta) plogis(eta)^days
     mu.eta <- function(eta)
       days*.Call("logit_mu_eta", eta, PACKAGE = "stats")*plogis(eta)^(days-1)
     valideta <- function(eta) TRUE
     link <- paste("logexp(", days, ")", sep="")
     structure(list(linkfun = linkfun, linkinv = linkinv,
                    mu.eta = mu.eta, valideta = valideta, name = link),
               class = "link-glm")
}

and in 2.4.0 you will able to use binomial(logexp(nestdata$exposure)).



On Mon, 17 Apr 2006, Mark Herzog wrote:

> I was a little hesitant to post to everyone until I figured out why
> there is a discrepancy in the intercept estimates when compared to the
> same model run in SAS vs. R.  Everything else comes out correctly,
> including the other coefficient estimates... so perhaps it is just the
> numerical method used.  I think glm in R is using IWLS, and SAS is using ML.
>
> If anyone has another idea feel free to let me know.  Here is my
> modified binomial family, now called logexposure.  Since there is no
> other choice for the link for this new logexposure "family", I have just
> embedded the make.link inside the logexposure family.
>
> Good luck and I'd appreciate any comments.
>
> logexposure<-function (link="logit",ExposureDays) {
>     logexp<-make.link("logit")
>     logexp$linkfun<-function(mu,expos=ExposureDays) {
>         log((mu^(1/expos))/(1-mu^(1/expos)))
>     }
>     logexp$linkinv<-function(eta,expos=ExposureDays) {
>         (exp(eta)/(1+exp(eta)))^expos
>     }
>     logexp$mu.eta<-function(eta,expos=ExposureDays) {
>         (expos*exp(eta*expos)*(1+exp(eta))^(-expos-1))
>     }
>     variance <- function(mu) mu * (1 - mu)
>     validmu <- function(mu) all(mu > 0) && all(mu < 1)
>     dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids",
>         y, mu, wt, PACKAGE = "stats")
>     aic <- function(y, n, mu, wt, dev) {
>         m <- if (any(n > 1))
>             n
>         else wt
>         -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
>             y), round(m), mu, log = TRUE))
>     }
>     initialize <- expression({
>         if (NCOL(y) == 1) {
>             if (is.factor(y)) y <- y != levels(y)[1]
>             n <- rep.int(1, nobs)
>             if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1")
>             mustart <- (weights * y + 0.5)/(weights + 1)
>             m <- weights * y
>             if (any(abs(m - round(m)) > 0.001))
>                  warning("non-integer successes in a binomial glm!")
>         } else if (NCOL(y) == 2) {
>             if (any(abs(y - round(y)) > 0.001))
>                 warning("non-integer counts in a binomial glm!")
>             n <- y[, 1] + y[, 2]
>             y <- ifelse(n == 0, 0, y[, 1]/n)
>             weights <- weights * n
>             mustart <- (n * y + 0.5)/(n + 1)
>         } else stop("for the binomial family,",
>                 " y must be a vector of 0 and 1's\n",
>             "or a 2 column matrix where col 1 is",
>             " no. successes and col 2 is no. failures")
>     })
>     structure(list(family="logexposure", link=logexp,
> linkfun=logexp$linkfun,
>         linkinv=logexp$linkinv, variance=variance, dev.resids=dev.resids,
>         aic=aic, mu.eta=logexp$mu.eta, initialize=initialize,
> validmu=validmu,
>         valideta=logexp$valideta), class = "family")
> }
>
> #Example
> nestdata<-read.table("http://www.branta.org/ShafferChatNestData/chat.txt")
> chat.glm.logexp<-glm(survive/trials~parastat+nest_ht*patsize,
> 	family=logexposure(ExposureDays=nestdata$expos),data=nestdata)
> # if you have MASS installed
> library(MASS)
> chat.step<-stepAIC(chat.glm.logexp,scope=list(upper=~parastat+nest_ht*patsize,lower=~1))
> chat.step$anova
> summary(chat.step)
>
> #Not run:
> # note this compares to following results from SAS :
> #              Criteria For Assessing Goodness Of Fit
> #
> #    Criterion                 DF           Value        Value/DF
> #    Deviance                 289        193.9987          0.6713
> #    Scaled Deviance          289        193.9987          0.6713
> #    Pearson Chi-Square       289        537.8609          1.8611
> #    Scaled Pearson X2        289        537.8609          1.8611
> #    Log Likelihood                      -96.9994
> #    Algorithm converged.
> #                       Analysis Of Parameter Estimates
> #
> #                                Standard Wald 95%   Chi-
> #Parameter   DF  Estimate Error        Limits       Square Pr > ChiSq
> ##Intercept   1  2.6973   0.2769   2.1546  3.2399   94.92  <.0001
> #parastat0    1 -1.0350   0.5201  -2.0544  -0.0155  3.96   0.0466
> #parastat1    0  0.0000   0.0000   0.0000  0.0000     .      .
> #patsizelarge 1  1.0844   0.5094   0.0861  2.0827   4.53   0.0333
> #patsizesmall 0  0.0000   0.0000   0.0000  0.0000     .       .
> #Scale        0  1.0000   0.0000   1.0000  1.0000
> #
> #NOTE: The scale parameter was held fixed.
>
>
>
> Mark Herzog, Ph.D.
> Program Leader, San Francisco Bay Research
> Wetland Division, PRBO Conservation Science
> 4990 Shoreline Highway 1
> Stinson Beach, CA 94970
> (415) 893-7677 x308
> mherzog at prbo.org
>
> Jessi Brown wrote:
>> I apologize for my earlier posting that, unbeknownst to me before,
>> apparently was not in the correct format for this list. Hopefully this
>> attempt will go through, and no-one will hold the newbie mistake
>> against me.
>>
>> I could really use some help in writing a new glm link function in
>> order to run an analysis of daily nest survival rates. I've struggled
>> with this for weeks now, and can at least display the contents of the
>> glm function, but I'm afraid I can't figure out even how to get
>> started at modifiying the appropriate section (fairly new at R,
>> complete beginner in writing functions in R!).
>>
>> Essentially, all I will be doing is running a logistic regression, but
>> with a different link function. The link function is a modification of
>> the logit link:
>> g(theta) = natural log( (theta ^(1/t)) / (1- (theta ^(1/t)) ) ;
>> where t is the length of the interval between nest checks.
>>
>> Could anyone help? I hope the answer is rather simple, since this just
>> adds the exponent (1/t) to the logit link function; but I have yet to
>> figure out how to do this.
>>
>> Thanks in advance for any help.
>>
>> cheers, Jessi Brown
>> Program in Ecology, Evolution, and Conservation Biology
>> University of Nevada-Reno
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>>
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list