[R] simulate() and glm fits

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Aug 2 19:38:14 CEST 2007

```I only see an "lm" method for simulate, so I think you are getting that by
inheritance.  It is inappropriate and you do need to add a "glm" method.

As ?simulate says fairly clearly that there is only an "lm" method in base
R, I don't think this is a bug, just user optimism.

On Thu, 2 Aug 2007, William Valdar wrote:

> Dear All,
>
> I have been trying to simulate data from a fitted glm using the simulate()
> function (version details at the bottom). This works for lm() fits and
> even for lmer() fits (in lme4). However, for glm() fits its output does
> not make sense to me -- am I missing something or is this a bug?
>
> Consider the following count data, modelled as gaussian, poisson and
> binomial responses:
>
> counts <- data.frame(
>         count  = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3,
>                 15, 11, 9),
>         batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11,
>                 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6),
>         weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3,
>                 376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5,
>                 409.7, 375, 318.5))
>
> fit.gaus <- lm(sqrt(count) ~ weight, data=counts)
> simulate(fit.gaus)[[1]]
> # [1]  2.3280287 -0.7097561  2.5370403  2.3935569 -0.7918554  2.8043848
> # [7]  1.7306636  4.2524854  0.3390859  1.4725342  1.1209236  0.1805066
> # [13]  3.6348710  1.9740871  2.0315499  2.9240702  3.9282456  1.2174952
> # [19]  0.1513106  0.6562989
>
> fit.poisson <- glm(count ~ weight,  family=poisson, data=counts)
> simulate(fit.poisson)[[1]]
> # [1]  2.11172412  5.68833210  4.32514109  4.93455140  8.14232584 5.05559585
> # [7]  3.15849145  7.10833484 -0.24817021  4.32475164 -0.08093695 5.06006941
> # [13] 7.38880304  6.47238760  9.53732036  2.59060450  5.10484831 3.21278865
> # [19] -0.25775316  4.40702454
>
> fit.binom <- glm(0!=count ~ weight,  family=binomial, data=counts)
> simulate(fit.binom)[[1]]
> #  [1]  0.47568732  1.26957295  0.38295877 -0.61108990 -1.03040311 0.73164463
> #  [7] -0.08117270  0.61211072  0.05559046  0.12979599 -0.29521387 1.76792413
> # [13]  0.52791230 -1.10505180 -1.61753057  0.89083075  0.70100867 0.00962979
> # [19] -0.43537313  0.38288809
>
> For simulate() to have a consistent definition, shouldn't Poisson and
> binomial produce only positive integer responses? Below is a hacked
> together function that does what I would expect (note that my binomial
> works for only binary response glms):
>
> simulate.glm.hack <- function(fit, nsim=1, seed=NULL)
> {
>     if (!is.null(seed)) set.seed(seed)
>     n <- length(fitted(fit))
>     theta <- model.matrix(fit\$terms, data=fit\$data) %*% coef(fit)
>
>     ymat <- matrix(NA, nrow=n, ncol=nsim)
>     for (s in 1:nsim)
>     {
>         if ("poisson"==fit\$family\$family)
>         {
>         }
>         else if ("binomial"==fit\$family\$family)
>         {
>             ymat[,s] <- rbinom(n, 1, fit\$family\$linkinv(theta))
>         }
>     }
>     as.data.frame(ymat)
> }
>
> # Eg:
>
> simulate.glm.hack(fit.poisson)[[1]]
> #  [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6
> simulate.glm.hack(fit.binom)[[1]]
> #  [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1
>
> If I am missing something please direct me to the relevant documentation.
>
> Cheers,
>
> Will Valdar
>
>   my version details...
>
>> sessionInfo()
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> [7] "base"
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Dr William Valdar               ++44 (0)1865 287 589
> Wellcome Trust Centre           valdar at well.ox.ac.uk
> for Human Genetics, Oxford      www.well.ox.ac.uk/~valdar
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

```