[R] simulate() and glm fits
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Aug 2 19:38:14 CEST 2007
I only see an "lm" method for simulate, so I think you are getting that by
inheritance. It is inappropriate and you do need to add a "glm" method.
As ?simulate says fairly clearly that there is only an "lm" method in base
R, I don't think this is a bug, just user optimism.
On Thu, 2 Aug 2007, William Valdar wrote:
> Dear All,
>
> I have been trying to simulate data from a fitted glm using the simulate()
> function (version details at the bottom). This works for lm() fits and
> even for lmer() fits (in lme4). However, for glm() fits its output does
> not make sense to me -- am I missing something or is this a bug?
>
> Consider the following count data, modelled as gaussian, poisson and
> binomial responses:
>
> counts <- data.frame(
> count = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3,
> 15, 11, 9),
> batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11,
> 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6),
> weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3,
> 376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5,
> 409.7, 375, 318.5))
>
> fit.gaus <- lm(sqrt(count) ~ weight, data=counts)
> simulate(fit.gaus)[[1]]
> # [1] 2.3280287 -0.7097561 2.5370403 2.3935569 -0.7918554 2.8043848
> # [7] 1.7306636 4.2524854 0.3390859 1.4725342 1.1209236 0.1805066
> # [13] 3.6348710 1.9740871 2.0315499 2.9240702 3.9282456 1.2174952
> # [19] 0.1513106 0.6562989
>
> fit.poisson <- glm(count ~ weight, family=poisson, data=counts)
> simulate(fit.poisson)[[1]]
> # [1] 2.11172412 5.68833210 4.32514109 4.93455140 8.14232584 5.05559585
> # [7] 3.15849145 7.10833484 -0.24817021 4.32475164 -0.08093695 5.06006941
> # [13] 7.38880304 6.47238760 9.53732036 2.59060450 5.10484831 3.21278865
> # [19] -0.25775316 4.40702454
>
> fit.binom <- glm(0!=count ~ weight, family=binomial, data=counts)
> simulate(fit.binom)[[1]]
> # [1] 0.47568732 1.26957295 0.38295877 -0.61108990 -1.03040311 0.73164463
> # [7] -0.08117270 0.61211072 0.05559046 0.12979599 -0.29521387 1.76792413
> # [13] 0.52791230 -1.10505180 -1.61753057 0.89083075 0.70100867 0.00962979
> # [19] -0.43537313 0.38288809
>
> For simulate() to have a consistent definition, shouldn't Poisson and
> binomial produce only positive integer responses? Below is a hacked
> together function that does what I would expect (note that my binomial
> works for only binary response glms):
>
> simulate.glm.hack <- function(fit, nsim=1, seed=NULL)
> {
> if (!is.null(seed)) set.seed(seed)
> n <- length(fitted(fit))
> theta <- model.matrix(fit$terms, data=fit$data) %*% coef(fit)
>
> ymat <- matrix(NA, nrow=n, ncol=nsim)
> for (s in 1:nsim)
> {
> if ("poisson"==fit$family$family)
> {
> ymat[,s] <- rpois(n, fit$family$linkinv(theta))
> }
> else if ("binomial"==fit$family$family)
> {
> ymat[,s] <- rbinom(n, 1, fit$family$linkinv(theta))
> }
> }
> as.data.frame(ymat)
> }
>
> # Eg:
>
> simulate.glm.hack(fit.poisson)[[1]]
> # [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6
> simulate.glm.hack(fit.binom)[[1]]
> # [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1
>
> If I am missing something please direct me to the relevant documentation.
>
> Cheers,
>
> Will Valdar
>
> my version details...
>
>> sessionInfo()
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> [7] "base"
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Dr William Valdar ++44 (0)1865 287 589
> Wellcome Trust Centre valdar at well.ox.ac.uk
> for Human Genetics, Oxford www.well.ox.ac.uk/~valdar
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list