[Rd] "simulate" does not include variability in parameter estimation
Spencer Graves
@pencer@gr@ve@ @end|ng |rom prod@y@e@com
Fri Dec 27 05:14:03 CET 2019
Hello, All:
The default "simulate" method for lm and glm seems to ignore the
sampling variance of the parameter estimates; see the trivial lm and
glm examples below. Both these examples estimate a mean with formula =
x~1. In both cases, the variance of the estimated mean is 1.
* In the lm example with x0 = c(-1, 1), var(x0) = 2, and
var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064. Shouldn't it be 3
= var(mean(x0)) + var(x0) = (2/2) + 2?
* In the glm example with x1=1,
var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't
it be 2 = var(glm estimate of the mean) + var(simulated Poisson
distribution) = 1 + 1?
I'm asking, because I've recently written "simulate" methods for
objects of class stats::glm and BMA::bic.glm, where my primary interest
was simulating the predicted mean with "newdata". I'm doing this, so I
can get Monte Carlo prediction intervals. My current code for
"simulate.glm" and "simulate.bic.glm" are available in the development
version of the "Ecfun" package on GitHub:
https://github.com/sbgraves237/Ecfun
Comparing my new code with "stats:::simulate.lm" raises the
following questions in my mind regarding "simulate" of a fit object:
1. Shouldn't "simulate" start by simulating the random
variability in the estimated parameters? I need that for my current
application. If a generic "simulate" function should NOT include this,
what should we call something that does include this? And how does the
current stats:::simulate.lm behavior fit with this?
2. Shouldn't "simulate" of a model fit include an option
for "newdata"? I need that for my application.
3. By comparing with "predict.glm", I felt I needed an
argument 'type = c("link", "response")'. "predict.glm" has an argument
'type = c("link", "response", "terms")'. I didn't need "terms", so I
didn't take the time to code that. However, a general "simulate"
function should probably include that.
4. My application involves assumed Poisson counts. I need
to simulate those as well. If I combined those with "simulate.glm",
what would I call them? I can't use the word "response", because that's
already used with a different meaning. Might "observations" be the
appropriate term?
What do you think?
Thanks,
Spencer Graves
> x0 <- c(-1, 1)
> var(x0)
[1] 2
> fit0 <- lm(x0~1)
> vcov(fit0)
(Intercept)
(Intercept) 1
> sim0 <- simulate(fit0, 10000, 1)
> var(unlist(sim0))
[1] 2.006408
> x1 <- 1
> fit1 <- glm(x1~1, poisson)
> coef(fit1)
(Intercept)
4.676016e-11
> exp(coef(fit1))
(Intercept)
1
> vcov(fit1)
(Intercept)
(Intercept) 0.9999903
> sim1 <- simulate(fit1, 10000, 1)
> var(unlist(sim1))
[1] 1.00617
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2
Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.2 tools_3.6.2
More information about the R-devel
mailing list