[Rd] prior.weights and weights()

Mon Feb 27 14:10:43 CET 2012

Ben Bolker <bbolker <at> gmail.com> writes:

  Bump?  (Quoting with "> " removed to make Gmane happy)

  I've since realized that it will be harder to use the
built-in $simulate methods in my application than I thought,
but I'm still curious about this issue (and might still be
able to use them with a slight hack-around if simulate()
methods internally used predict(object,type="response",...))
but I'm still curious about this issue.

  And I still think I have shown below that $simulate() doesn't work
properly with na.action=na.exclude ...

  Ben Bolker

=======================
I'm wondering whether anyone has any insight into why the 'simulate'
methods for the built-in glm() families (binomial, Poisson, Gamma ...)
extract the prior weights using object$prior.weights rather than
weights(object,"prior") ?

  At first I thought this was so that things work correctly when e.g.
subset= and na.action=na.exclude are used. However, the current versions
of the functions don't seem to work particularly well when these options
are used.  A Poisson example, along with a modified function that is
na.exclude-safe, is shown below ...

  My reason for wanting this is that I'm working on a package that uses
glm-like objects that don't have list-like structure, so can't contain
$prior.weights elements.  If weights were accessed using accessors, then
we could use the existing simulate() functions ...

  If R-core agreed that this is a shortcoming of the current
implementation, all of the simulate() methods would need to be rewritten
to account for this (as far as I can see there are four, for
binomial/Poisson/Gamma/inverse Gaussian -- the base method in
simulate.lm works OK (although it generates unnecessary random deviates
corresponding to the NA values in the fitted output).

  Actually, an even better design (in my opinion) might be to use
predict(object,type="response",...) internally rather than fitted() --
this would, for free, allow the use case

simulate(object,newdata=...)

  which would simulate values from the model for a novel set of
predictor variables ... (coincidentally, this would also allow our
package to use the existing framework more easily).

  Are there any circumstances under which
predict(object,type="response") is/could be *different* from
fitted(object) for a 'glm' object ... ?

  cheers
    Ben Bolker

===========================
d <- data.frame(y=rpois(10,1),x=rep(c(1,NA),5),f=rep(1:2,each=5))
## subset AND NA omission
g <- glm(y~x,data=d,subset=(f==2),family=poisson)

length(fitted(g)) ## 2
length(weights(g)) ## 2
length(g$prior.weights) ## 2
poisson()$simulate(g,1)  ## works

## now try na.exclude
g <- glm(y~x,data=d,subset=(f==2),na.action=na.exclude,
         family=poisson)
length(fitted(g)) ## 5
length(weights(g)) ## 5
length(g$prior.weights) ## 2
poisson()$simulate(g,1)  ## fails

## here's a new, 'safe' version of the Poisson simulate function ...
simnew <- function (object, nsim) {
  wts <- weights(object)
  if (any(na.omit(wts) != 1))
    warning("ignoring prior weights")
  ftd <- na.omit(fitted(object))
  napredict(na.action(object),
            rpois(nsim * length(ftd), ftd))
}

simnew(g,1)