[Rd] "simulate" does not include variability in parameter estimation
Spencer Graves
@pencer@gr@ve@ @end|ng |rom prod@y@e@com
Fri Dec 27 16:31:18 CET 2019
On 2019-12-27 04:34, Duncan Murdoch wrote:
> On 26/12/2019 11:14 p.m., Spencer Graves wrote:
>> Hello, All:
>>
>>
>> The default "simulate" method for lm and glm seems to ignore the
>> sampling variance of the parameter estimates; see the trivial lm and
>> glm examples below. Both these examples estimate a mean with formula =
>> x~1. In both cases, the variance of the estimated mean is 1.
>
> That's how it's documented to operate. Nothing in the help page
> suggests it would try to simulate parameter values. Indeed, it
> doesn't have enough information on the distribution to sample from:
> the appropriate distribution to simulate from if you want to include
> uncertainty in the parameter estimates is the posterior distribution,
> but lm and glm take a classical point of view, not a Bayesian point of
> view, so they have no concept of a posterior.
Thanks for the reply. What do you suggest for someone who wants
confidence, prediction and tolerance intervals for newdata for a general
fit object?
For a glm object, one could get confidence intervals starting
with predicted mean and standard errors
from predict(glm(...), newdata, type='link', se.fit=TRUE), then linkinv
to get the confidence intervals on scale of expected values of the
random variables. From that one could compute tolerance intervals.
Is there a way to get more standard prediction intervals from a
glm object, other than the Bayesian approach coded into
Ecfun:::simulate.glm? And that still doesn't answer the question re.
confidence intervals for a more general fit object like BMA::bic.glm.
Comments?
Thanks,
Spencer Graves
>
>> * In the lm example with x0 = c(-1, 1), var(x0) = 2, and
>> var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064. Shouldn't it be 3
>> = var(mean(x0)) + var(x0) = (2/2) + 2?
>
> That calculation ignores the uncertainty in the estimation of sigma.
>
> Duncan Murdoch
>
>>
>>
>> * In the glm example with x1=1,
>> var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't
>> it be 2 = var(glm estimate of the mean) + var(simulated Poisson
>> distribution) = 1 + 1?
>>
>>
>> I'm asking, because I've recently written "simulate" methods for
>> objects of class stats::glm and BMA::bic.glm, where my primary interest
>> was simulating the predicted mean with "newdata". I'm doing this, so I
>> can get Monte Carlo prediction intervals. My current code for
>> "simulate.glm" and "simulate.bic.glm" are available in the development
>> version of the "Ecfun" package on GitHub:
>>
>>
>> https://github.com/sbgraves237/Ecfun
>>
>>
>> Comparing my new code with "stats:::simulate.lm" raises the
>> following questions in my mind regarding "simulate" of a fit object:
>>
>>
>> 1. Shouldn't "simulate" start by simulating the random
>> variability in the estimated parameters? I need that for my current
>> application. If a generic "simulate" function should NOT include this,
>> what should we call something that does include this? And how does the
>> current stats:::simulate.lm behavior fit with this?
>>
>>
>> 2. Shouldn't "simulate" of a model fit include an option
>> for "newdata"? I need that for my application.
>>
>>
>> 3. By comparing with "predict.glm", I felt I needed an
>> argument 'type = c("link", "response")'. "predict.glm" has an argument
>> 'type = c("link", "response", "terms")'. I didn't need "terms", so I
>> didn't take the time to code that. However, a general "simulate"
>> function should probably include that.
>>
>>
>> 4. My application involves assumed Poisson counts. I
>> need
>> to simulate those as well. If I combined those with "simulate.glm",
>> what would I call them? I can't use the word "response", because that's
>> already used with a different meaning. Might "observations" be the
>> appropriate term?
>>
>>
>> What do you think?
>> Thanks,
>> Spencer Graves
>>
>>
>> > x0 <- c(-1, 1)
>> > var(x0)
>> [1] 2
>> > fit0 <- lm(x0~1)
>> > vcov(fit0)
>> (Intercept)
>> (Intercept) 1
>> > sim0 <- simulate(fit0, 10000, 1)
>> > var(unlist(sim0))
>> [1] 2.006408
>> > x1 <- 1
>> > fit1 <- glm(x1~1, poisson)
>> > coef(fit1)
>> (Intercept)
>> 4.676016e-11
>> > exp(coef(fit1))
>> (Intercept)
>> 1
>> > vcov(fit1)
>> (Intercept)
>> (Intercept) 0.9999903
>> > sim1 <- simulate(fit1, 10000, 1)
>> > var(unlist(sim1))
>> [1] 1.00617
>> > sessionInfo()
>> R version 3.6.2 (2019-12-12)
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> Running under: macOS Catalina 10.15.2
>>
>> Matrix products: default
>> BLAS:
>> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>
>> LAPACK:
>> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
>>
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.2 tools_3.6.2
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
More information about the R-devel
mailing list