[Rd] "simulate" does not include variability in parameter estimation
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Fri Dec 27 17:46:34 CET 2019
On 27/12/2019 10:31 a.m., Spencer Graves wrote:
>
>
> On 2019-12-27 04:34, Duncan Murdoch wrote:
>> On 26/12/2019 11:14 p.m., Spencer Graves wrote:
>>> Hello, All:
>>>
>>>
>>> The default "simulate" method for lm and glm seems to ignore the
>>> sampling variance of the parameter estimates; see the trivial lm and
>>> glm examples below. Both these examples estimate a mean with formula =
>>> x~1. In both cases, the variance of the estimated mean is 1.
>>
>> That's how it's documented to operate. Nothing in the help page
>> suggests it would try to simulate parameter values. Indeed, it
>> doesn't have enough information on the distribution to sample from:
>> the appropriate distribution to simulate from if you want to include
>> uncertainty in the parameter estimates is the posterior distribution,
>> but lm and glm take a classical point of view, not a Bayesian point of
>> view, so they have no concept of a posterior.
>
>
> Thanks for the reply. What do you suggest for someone who wants
> confidence, prediction and tolerance intervals for newdata for a general
> fit object?
I can't suggest anything other than to take a Bayesian approach.
Duncan Murdoch
>
> For a glm object, one could get confidence intervals starting
> with predicted mean and standard errors
> from predict(glm(...), newdata, type='link', se.fit=TRUE), then linkinv
> to get the confidence intervals on scale of expected values of the
> random variables. From that one could compute tolerance intervals.
>
>
> Is there a way to get more standard prediction intervals from a
> glm object, other than the Bayesian approach coded into
> Ecfun:::simulate.glm? And that still doesn't answer the question re.
> confidence intervals for a more general fit object like BMA::bic.glm.
>
>
> Comments?
> Thanks,
> Spencer Graves
>
>>
>>> * In the lm example with x0 = c(-1, 1), var(x0) = 2, and
>>> var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064. Shouldn't it be 3
>>> = var(mean(x0)) + var(x0) = (2/2) + 2?
>>
>> That calculation ignores the uncertainty in the estimation of sigma.
>>
>> Duncan Murdoch
>>
>>>
>>>
>>> * In the glm example with x1=1,
>>> var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't
>>> it be 2 = var(glm estimate of the mean) + var(simulated Poisson
>>> distribution) = 1 + 1?
>>>
>>>
>>> I'm asking, because I've recently written "simulate" methods for
>>> objects of class stats::glm and BMA::bic.glm, where my primary interest
>>> was simulating the predicted mean with "newdata". I'm doing this, so I
>>> can get Monte Carlo prediction intervals. My current code for
>>> "simulate.glm" and "simulate.bic.glm" are available in the development
>>> version of the "Ecfun" package on GitHub:
>>>
>>>
>>> https://github.com/sbgraves237/Ecfun
>>>
>>>
>>> Comparing my new code with "stats:::simulate.lm" raises the
>>> following questions in my mind regarding "simulate" of a fit object:
>>>
>>>
>>> 1. Shouldn't "simulate" start by simulating the random
>>> variability in the estimated parameters? I need that for my current
>>> application. If a generic "simulate" function should NOT include this,
>>> what should we call something that does include this? And how does the
>>> current stats:::simulate.lm behavior fit with this?
>>>
>>>
>>> 2. Shouldn't "simulate" of a model fit include an option
>>> for "newdata"? I need that for my application.
>>>
>>>
>>> 3. By comparing with "predict.glm", I felt I needed an
>>> argument 'type = c("link", "response")'. "predict.glm" has an argument
>>> 'type = c("link", "response", "terms")'. I didn't need "terms", so I
>>> didn't take the time to code that. However, a general "simulate"
>>> function should probably include that.
>>>
>>>
>>> 4. My application involves assumed Poisson counts. I
>>> need
>>> to simulate those as well. If I combined those with "simulate.glm",
>>> what would I call them? I can't use the word "response", because that's
>>> already used with a different meaning. Might "observations" be the
>>> appropriate term?
>>>
>>>
>>> What do you think?
>>> Thanks,
>>> Spencer Graves
>>>
>>>
>>> > x0 <- c(-1, 1)
>>> > var(x0)
>>> [1] 2
>>> > fit0 <- lm(x0~1)
>>> > vcov(fit0)
>>> (Intercept)
>>> (Intercept) 1
>>> > sim0 <- simulate(fit0, 10000, 1)
>>> > var(unlist(sim0))
>>> [1] 2.006408
>>> > x1 <- 1
>>> > fit1 <- glm(x1~1, poisson)
>>> > coef(fit1)
>>> (Intercept)
>>> 4.676016e-11
>>> > exp(coef(fit1))
>>> (Intercept)
>>> 1
>>> > vcov(fit1)
>>> (Intercept)
>>> (Intercept) 0.9999903
>>> > sim1 <- simulate(fit1, 10000, 1)
>>> > var(unlist(sim1))
>>> [1] 1.00617
>>> > sessionInfo()
>>> R version 3.6.2 (2019-12-12)
>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>> Running under: macOS Catalina 10.15.2
>>>
>>> Matrix products: default
>>> BLAS:
>>> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>>
>>> LAPACK:
>>> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
>>>
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.6.2 tools_3.6.2
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>
More information about the R-devel
mailing list