[R] mu^2(1-mu)^2 variance function for GLM

David Firth d.firth at warwick.ac.uk
Thu Jun 16 17:22:31 CEST 2005


Dear Henric

I do not have a ready stock of other examples, but I do have my own 
version of a family function for this, reproduced below.  It differs 
from yours (apart from being a regular family function rather than 
using a modified "quasi") in the definition of deviance residuals.  
These necessarily involve an arbitrary constant (see McCullagh and 
Nelder, 1989, p330); in my function that arbitrariness is in the choice 
eps <- 0.0005.  I don't think the deviance contributions as you 
specified in your code below will have the right derivative (with 
respect to mu) for observations where y=0 or y=1.

Anyway, this at least gives you some kind of check.  I hope it helps.  
This function will be part of a new package which Heather Turner and I 
will submit to CRAN in a few days' time.  Please do let me know if you 
find any problems with it.

Here is my "wedderburn" family function:

"wedderburn" <-
     function (link = "logit")
{
     linktemp <- substitute(link)
     if (!is.character(linktemp)) {
         linktemp <- deparse(linktemp)
         if (linktemp == "link")
             linktemp <- eval(link)
     }
     if (any(linktemp == c("logit", "probit", "cloglog")))
         stats <- make.link(linktemp)
     else stop(paste(linktemp,
                     "link not available for wedderburn quasi-family;",
                     "available links are",
                     "\"logit\", \"probit\" and \"cloglog\""))
     variance <- function(mu)  mu^2 * (1-mu)^2
     validmu <- function(mu) {
         all(mu > 0) && all(mu < 1)}
     dev.resids <- function(y, mu, wt){
         eps <-  0.0005
         2 * wt * (y/mu + (1 - y)/(1 - mu) - 2 +
                   (2 * y - 1) * log((y + eps)*(1 - mu)/((1- y + eps) * 
mu)))
     }
     aic <- function(y, n, mu, wt, dev) NA
     initialize <- expression({
         if (any(y < 0 | y > 1)) stop(paste(
                    "Values for the wedderburn family must be in [0,1]"))
         n <- rep.int(1, nobs)
         mustart <- (y + 0.1)/1.2
     })
     structure(list(family = "wedderburn",
                    link = linktemp,
                    linkfun = stats$linkfun,
                    linkinv = stats$linkinv,
                    variance = variance,
                    dev.resids = dev.resids,
                    aic = aic,
                    mu.eta = stats$mu.eta,
                    initialize = initialize,
                    validmu = validmu,
                    valideta = stats$valideta),
               class = "family")
}

Best wishes,
David
http://www.warwick.ac.uk/go/dfirth

On 16 Jun 2005, at 09:27, Henric Nilsson wrote:

> Dear list,
>
> I'm trying to mimic the analysis of Wedderburn (1974) as cited by
> McCullagh and Nelder (1989) on p.328-332. This is the leaf-blotch on
> barley example, and the data is available in the `faraway' package.
>
> Wedderburn suggested using the variance function mu^2(1-mu)^2. This
> variance function isn't readily available in R's `quasi' family object,
> but it seems to me that the following definition could be used:
>
> }, "mu^2(1-mu)^2" = {
>      variance <- function(mu) mu^2 * (1 - mu)^2
>      validmu <- function(mu) all(mu > 0) && all(mu < 1)
>      dev.resids <- function(y, mu, wt) 2 * wt * ((2 * y - 1) *
>          (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
>          (1 - y)/(1 - mu)))) - 2 + y/mu + (1 - y)/(1 - mu))
>
> I've modified the `quasi' function accordingly (into `quasi2' given
> below) and my results are very much in line with the ones cited by
> McCullagh and Nelder on p.330-331:
>
>> data(leafblotch, package = "faraway")
>> summary(fit <- glm(blotch ~ site + variety,
> +         family = quasi2(link = "logit", variance = "mu^2(1-mu)^2"),
> +         data = leafblotch))
>
> Call:
> glm(formula = blotch ~ site + variety, family = quasi2(link = "logit",
>      variance = "mu^2(1-mu)^2"), data = leafblotch)
>
> Deviance Residuals:
>       Min        1Q    Median        3Q       Max
> -3.23175  -0.65385  -0.09426   0.46946   1.97152
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) -7.92253    0.44463 -17.818  < 2e-16 ***
> site2        1.38308    0.44463   3.111  0.00268 **
> site3        3.86013    0.44463   8.682 8.18e-13 ***
> site4        3.55697    0.44463   8.000 1.53e-11 ***
> site5        4.10841    0.44463   9.240 7.48e-14 ***
> site6        4.30541    0.44463   9.683 1.13e-14 ***
> site7        4.91811    0.44463  11.061  < 2e-16 ***
> site8        5.69492    0.44463  12.808  < 2e-16 ***
> site9        7.06762    0.44463  15.896  < 2e-16 ***
> variety2    -0.46728    0.46868  -0.997  0.32210
> variety3     0.07877    0.46868   0.168  0.86699
> variety4     0.95418    0.46868   2.036  0.04544 *
> variety5     1.35276    0.46868   2.886  0.00514 **
> variety6     1.32859    0.46868   2.835  0.00595 **
> variety7     2.34066    0.46868   4.994 3.99e-06 ***
> variety8     3.26268    0.46868   6.961 1.30e-09 ***
> variety9     3.13556    0.46868   6.690 4.10e-09 ***
> variety10    3.88736    0.46868   8.294 4.33e-12 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for quasi family taken to be 0.9884755)
>
>      Null deviance: 339.488  on 89  degrees of freedom
> Residual deviance:  71.961  on 72  degrees of freedom
> AIC: NA
>
> Number of Fisher Scoring iterations: 18
>
>
> Also, the plot of the Pearson residuals against the linear predictor
>
>> plot(residuals(fit, type = "pearson") ~ fit$linear.predictors)
>> abline(h = 0, lty = 2)
>
> results in a plot that, to my eyes at least, is very close to Fig. 9.2
> on p. 332.
>
> However, I can't seem to find any other published examples using this
> variance function. I'd really like to verify that my code above is
> working before applying it to real data sets. Can anybody help?
>
> Thanks,
> Henric
> - - - - -
> quasi2 <- function (link = "identity", variance = "constant")
> {
>      linktemp <- substitute(link)
>      if (is.expression(linktemp) || is.call(linktemp))
>          linktemp <- link
>      else if (!is.character(linktemp))
>          linktemp <- deparse(linktemp)
>      if (is.character(linktemp))
>          stats <- make.link(linktemp)
>      else stats <- linktemp
>      variancetemp <- substitute(variance)
>      if (!is.character(variancetemp)) {
>          variancetemp <- deparse(variancetemp)
>          if (linktemp == "variance")
>              variancetemp <- eval(variance)
>      }
>      switch(variancetemp, constant = {
>          variance <- function(mu) rep.int(1, length(mu))
>          dev.resids <- function(y, mu, wt) wt * ((y - mu)^2)
>          validmu <- function(mu) TRUE
>      }, "mu(1-mu)" = {
>          variance <- function(mu) mu * (1 - mu)
>          validmu <- function(mu) all(mu > 0) && all(mu < 1)
>          dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y 
> ==
>              0, 1, y/mu)) + (1 - y) * log(ifelse(y == 1, 1, (1 -
>              y)/(1 - mu))))
>      }, "mu^2(1-mu)^2" = {
>          variance <- function(mu) mu^2 * (1 - mu)^2
>          validmu <- function(mu) all(mu > 0) && all(mu < 1)
>          dev.resids <- function(y, mu, wt) 2 * wt * ((2 * y - 1) *
>              (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
>              (1 - y)/(1 - mu)))) - 2 + y/mu + (1 - y)/(1 - mu))
>      }, mu = {
>          variance <- function(mu) mu
>          validmu <- function(mu) all(mu > 0)
>          dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y 
> ==
>              0, 1, y/mu)) - (y - mu))
>      }, "mu^2" = {
>          variance <- function(mu) mu^2
>          validmu <- function(mu) all(mu > 0)
>          dev.resids <- function(y, mu, wt) pmax(-2 * wt * 
> (log(ifelse(y ==
>              0, 1, y)/mu) - (y - mu)/mu), 0)
>      }, "mu^3" = {
>          variance <- function(mu) mu^3
>          validmu <- function(mu) all(mu > 0)
>          dev.resids <- function(y, mu, wt) wt * ((y - mu)^2)/(y *
>              mu^2)
>      }, stop(gettextf("'variance' \"%s\" is invalid: possible values 
> are
> \"mu(1-mu)\", \"mu^2(1-mu)^2\", \"mu\", \"mu^2\", \"mu^3\" and
> \"constant\"",
>          variancetemp), domain = NA))
>      initialize <- expression({
>          n <- rep.int(1, nobs)
>          mustart <- y + 0.1 * (y == 0)
>      })
>      aic <- function(y, n, mu, wt, dev) NA
>      structure(list(family = "quasi", link = linktemp, linkfun =
> stats$linkfun,
>          linkinv = stats$linkinv, variance = variance, dev.resids =
> dev.resids,
>          aic = aic, mu.eta = stats$mu.eta, initialize = initialize,
>          validmu = validmu, valideta = stats$valideta, varfun =
> variancetemp),
>          class = "family")
> }
>
>
>
> <Header>
>




More information about the R-help mailing list