[R] second try; writing user-defined GLM link function
John Fox
jfox at mcmaster.ca
Tue Apr 18 01:01:37 CEST 2006
Dear Mark,
IWLS is a method for obtaining ML estimates in GLMs.
The dummy variables for representing factors (in R terminology, CLASS
variables in SAS) are treated differently. SAS uses a full-rank
parametrization of the model matrix, but sets the coefficient for one
category in a factor to 0, so the effect is equivalent to dummy-coding --
that is the default "treatment" contrast coding in R. Note, however, that in
the SAS output the coefficient for parastat == 0 is estimated and that for
parastat == 1 is set to 0; this is the opposite of what happens by default
in R. If you look more closely at the output, you'll see that the
coefficient for parastat in R is of the same magnitude as in SAS but of the
opposite sign. The same is true for the factor patsize. Note that the
standard errors in the two outputs differ because in R you're estimating the
dispersion parameter while in SAS it's fixed to 1.
The upshot is that (except for the dispersion parameter) you have equivalent
but not identical parametrizations of the model.
I hope this helps,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mark Herzog
> Sent: Monday, April 17, 2006 11:16 AM
> To: Jessi Brown
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] second try; writing user-defined GLM link function
>
> I was a little hesitant to post to everyone until I figured
> out why there is a discrepancy in the intercept estimates
> when compared to the same model run in SAS vs. R. Everything
> else comes out correctly, including the other coefficient
> estimates... so perhaps it is just the numerical method used.
> I think glm in R is using IWLS, and SAS is using ML.
>
> If anyone has another idea feel free to let me know. Here is
> my modified binomial family, now called logexposure. Since
> there is no other choice for the link for this new
> logexposure "family", I have just embedded the make.link
> inside the logexposure family.
>
> Good luck and I'd appreciate any comments.
>
> logexposure<-function (link="logit",ExposureDays) {
> logexp<-make.link("logit")
> logexp$linkfun<-function(mu,expos=ExposureDays) {
> log((mu^(1/expos))/(1-mu^(1/expos)))
> }
> logexp$linkinv<-function(eta,expos=ExposureDays) {
> (exp(eta)/(1+exp(eta)))^expos
> }
> logexp$mu.eta<-function(eta,expos=ExposureDays) {
> (expos*exp(eta*expos)*(1+exp(eta))^(-expos-1))
> }
> variance <- function(mu) mu * (1 - mu)
> validmu <- function(mu) all(mu > 0) && all(mu < 1)
> dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids",
> y, mu, wt, PACKAGE = "stats")
> aic <- function(y, n, mu, wt, dev) {
> m <- if (any(n > 1))
> n
> else wt
> -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
> y), round(m), mu, log = TRUE))
> }
> initialize <- expression({
> if (NCOL(y) == 1) {
> if (is.factor(y)) y <- y != levels(y)[1]
> n <- rep.int(1, nobs)
> if (any(y < 0 | y > 1)) stop("y values must be 0
> <= y <= 1")
> mustart <- (weights * y + 0.5)/(weights + 1)
> m <- weights * y
> if (any(abs(m - round(m)) > 0.001))
> warning("non-integer successes in a binomial glm!")
> } else if (NCOL(y) == 2) {
> if (any(abs(y - round(y)) > 0.001))
> warning("non-integer counts in a binomial glm!")
> n <- y[, 1] + y[, 2]
> y <- ifelse(n == 0, 0, y[, 1]/n)
> weights <- weights * n
> mustart <- (n * y + 0.5)/(n + 1)
> } else stop("for the binomial family,",
> " y must be a vector of 0 and 1's\n",
> "or a 2 column matrix where col 1 is",
> " no. successes and col 2 is no. failures")
> })
> structure(list(family="logexposure", link=logexp,
> linkfun=logexp$linkfun,
> linkinv=logexp$linkinv, variance=variance,
> dev.resids=dev.resids,
> aic=aic, mu.eta=logexp$mu.eta,
> initialize=initialize, validmu=validmu,
> valideta=logexp$valideta), class = "family") }
>
> #Example
> nestdata<-read.table("http://www.branta.org/ShafferChatNestDat
> a/chat.txt")
> chat.glm.logexp<-glm(survive/trials~parastat+nest_ht*patsize,
> family=logexposure(ExposureDays=nestdata$expos),data=nestdata)
> # if you have MASS installed
> library(MASS)
> chat.step<-stepAIC(chat.glm.logexp,scope=list(upper=~parastat+
> nest_ht*patsize,lower=~1))
> chat.step$anova
> summary(chat.step)
>
> #Not run:
> # note this compares to following results from SAS :
> # Criteria For Assessing Goodness Of Fit
> #
> # Criterion DF Value Value/DF
> # Deviance 289 193.9987 0.6713
> # Scaled Deviance 289 193.9987 0.6713
> # Pearson Chi-Square 289 537.8609 1.8611
> # Scaled Pearson X2 289 537.8609 1.8611
> # Log Likelihood -96.9994
> # Algorithm converged.
> # Analysis Of Parameter Estimates
> #
> # Standard Wald 95% Chi-
> #Parameter DF Estimate Error Limits Square Pr > ChiSq
> ##Intercept 1 2.6973 0.2769 2.1546 3.2399 94.92 <.0001
> #parastat0 1 -1.0350 0.5201 -2.0544 -0.0155 3.96 0.0466
> #parastat1 0 0.0000 0.0000 0.0000 0.0000 . .
> #patsizelarge 1 1.0844 0.5094 0.0861 2.0827 4.53 0.0333
> #patsizesmall 0 0.0000 0.0000 0.0000 0.0000 . .
> #Scale 0 1.0000 0.0000 1.0000 1.0000
> #
> #NOTE: The scale parameter was held fixed.
>
>
>
> Mark Herzog, Ph.D.
> Program Leader, San Francisco Bay Research Wetland Division,
> PRBO Conservation Science 4990 Shoreline Highway 1 Stinson
> Beach, CA 94970
> (415) 893-7677 x308
> mherzog at prbo.org
>
> Jessi Brown wrote:
> > I apologize for my earlier posting that, unbeknownst to me before,
> > apparently was not in the correct format for this list.
> Hopefully this
> > attempt will go through, and no-one will hold the newbie mistake
> > against me.
> >
> > I could really use some help in writing a new glm link function in
> > order to run an analysis of daily nest survival rates. I've
> struggled
> > with this for weeks now, and can at least display the
> contents of the
> > glm function, but I'm afraid I can't figure out even how to get
> > started at modifiying the appropriate section (fairly new at R,
> > complete beginner in writing functions in R!).
> >
> > Essentially, all I will be doing is running a logistic
> regression, but
> > with a different link function. The link function is a
> modification of
> > the logit link:
> > g(theta) = natural log( (theta ^(1/t)) / (1- (theta ^(1/t))
> ) ; where
> > t is the length of the interval between nest checks.
> >
> > Could anyone help? I hope the answer is rather simple,
> since this just
> > adds the exponent (1/t) to the logit link function; but I
> have yet to
> > figure out how to do this.
> >
> > Thanks in advance for any help.
> >
> > cheers, Jessi Brown
> > Program in Ecology, Evolution, and Conservation Biology
> University of
> > Nevada-Reno
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list