[R] simulate() and glm fits
William Valdar
valdar at well.ox.ac.uk
Thu Aug 2 16:59:29 CEST 2007
Dear All,
I have been trying to simulate data from a fitted glm using the simulate()
function (version details at the bottom). This works for lm() fits and
even for lmer() fits (in lme4). However, for glm() fits its output does
not make sense to me -- am I missing something or is this a bug?
Consider the following count data, modelled as gaussian, poisson and
binomial responses:
counts <- data.frame(
count = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3,
15, 11, 9),
batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11,
3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6),
weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3,
376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5,
409.7, 375, 318.5))
fit.gaus <- lm(sqrt(count) ~ weight, data=counts)
simulate(fit.gaus)[[1]]
# [1] 2.3280287 -0.7097561 2.5370403 2.3935569 -0.7918554 2.8043848
# [7] 1.7306636 4.2524854 0.3390859 1.4725342 1.1209236 0.1805066
# [13] 3.6348710 1.9740871 2.0315499 2.9240702 3.9282456 1.2174952
# [19] 0.1513106 0.6562989
fit.poisson <- glm(count ~ weight, family=poisson, data=counts)
simulate(fit.poisson)[[1]]
# [1] 2.11172412 5.68833210 4.32514109 4.93455140 8.14232584 5.05559585
# [7] 3.15849145 7.10833484 -0.24817021 4.32475164 -0.08093695 5.06006941
# [13] 7.38880304 6.47238760 9.53732036 2.59060450 5.10484831 3.21278865
# [19] -0.25775316 4.40702454
fit.binom <- glm(0!=count ~ weight, family=binomial, data=counts)
simulate(fit.binom)[[1]]
# [1] 0.47568732 1.26957295 0.38295877 -0.61108990 -1.03040311 0.73164463
# [7] -0.08117270 0.61211072 0.05559046 0.12979599 -0.29521387 1.76792413
# [13] 0.52791230 -1.10505180 -1.61753057 0.89083075 0.70100867 0.00962979
# [19] -0.43537313 0.38288809
For simulate() to have a consistent definition, shouldn't Poisson and
binomial produce only positive integer responses? Below is a hacked
together function that does what I would expect (note that my binomial
works for only binary response glms):
simulate.glm.hack <- function(fit, nsim=1, seed=NULL)
{
if (!is.null(seed)) set.seed(seed)
n <- length(fitted(fit))
theta <- model.matrix(fit$terms, data=fit$data) %*% coef(fit)
ymat <- matrix(NA, nrow=n, ncol=nsim)
for (s in 1:nsim)
{
if ("poisson"==fit$family$family)
{
ymat[,s] <- rpois(n, fit$family$linkinv(theta))
}
else if ("binomial"==fit$family$family)
{
ymat[,s] <- rbinom(n, 1, fit$family$linkinv(theta))
}
}
as.data.frame(ymat)
}
# Eg:
simulate.glm.hack(fit.poisson)[[1]]
# [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6
simulate.glm.hack(fit.binom)[[1]]
# [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1
If I am missing something please direct me to the relevant documentation.
Cheers,
Will Valdar
my version details...
> sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr William Valdar ++44 (0)1865 287 589
Wellcome Trust Centre valdar at well.ox.ac.uk
for Human Genetics, Oxford www.well.ox.ac.uk/~valdar
More information about the R-help
mailing list